defaultdict : The Evolution of a Condition-less Dictionary

Python boasts numerous external libraries, but a solid understanding of just the standard library can enable you to write sufficiently powerful code in practice. In this post, we'll take an in-depth look at collections.defaultdict.

Through this article, you'll not only be introduced to the concept but also gain a clear understanding of when, why, and how to use defaultdict.

If you're curious about the first in the collections series, the Counter class, I recommend reading the previous article. Mastering the Python Standard Library ① - collections.Counter


1. Basic Concept: What is defaultdict?

Tux with defaultdicts

defaultdict is a special subclass of dictionaries (dict) included in the Python standard library's collections module. Whereas accessing a key that doesn't exist in a regular dictionary raises a KeyError, defaultdict allows you to specify a factory function that automatically generates a default value, making your code cleaner and preventing errors.


2. Basic Usage

from collections import defaultdict

d = defaultdict(int)
d['apple'] += 1
print(d)  # defaultdict(<class 'int'>, {'apple': 1})

Here, int() returns 0 as a default value. When accessing the non-existent key 'apple', 0 is automatically generated without raising a KeyError, and then +1 is applied.


3. Examples of Various Default Values

from collections import defaultdict

# Default value: 0 (int)
counter = defaultdict(int)
counter['a'] += 1
print(counter)  # defaultdict(<class 'int'>, {'a': 1})

# Default value: empty list
group = defaultdict(list)
group['fruit'].append('apple')
group['fruit'].append('banana')
print(group)  # defaultdict(<class 'list'>, {'fruit': ['apple', 'banana']})

# Default value: empty set
unique_tags = defaultdict(set)
unique_tags['tags'].add('python')
unique_tags['tags'].add('coding')
print(unique_tags)  # defaultdict(<class 'set'>, {'tags': {'python', 'coding'}})

# Default value: custom initial value
fixed = defaultdict(lambda: 100)
print(fixed['unknown'])  # 100

4. Practical Examples

1. Counting Word Frequency

words = ['apple', 'banana', 'apple', 'orange', 'banana']
counter = defaultdict(int)

for word in words:
    counter[word] += 1

print(counter)
# defaultdict(<class 'int'>, {'apple': 2, 'banana': 2, 'orange': 1})

👉 Counter vs defaultdict
Counting word frequency is better suited to collections.Counter(), where statistical or ranking analysis is required. However, for simple accumulation like counting, defaultdict(int) can also be used sufficiently concisely.

2. Organizing Logs by Group

logs = [
    ('2024-01-01', 'INFO'),
    ('2024-01-01', 'ERROR'),
    ('2024-01-02', 'DEBUG'),
]

grouped = defaultdict(list)
for date, level in logs:
    grouped[date].append(level)

print(grouped)
# defaultdict(<class 'list'>, {'2024-01-01': ['INFO', 'ERROR'], '2024-01-02': ['DEBUG']})

3. Organizing Unique Tags

entries = [
    ('post1', 'python'),
    ('post1', 'coding'),
    ('post1', 'python'),
]

tags = defaultdict(set)
for post, tag in entries:
    tags[post].add(tag)

print(tags)
# defaultdict(<class 'set'>, {'post1': {'python', 'coding'}})

5. Cautions

  • defaultdict stores a default value generator internally, so its repr() may differ from that of a regular dict.
  • This could cause problems during JSON serialization. It's safer to convert using dict(d).
  • Default values are generated only when accessed via []. They will not be generated if accessed via get().
from collections import defaultdict

d = defaultdict(list)
print(d.get('missing'))  # None
print(d['missing'])      # []

6. When is it Useful? – Three Unique Advantages of defaultdict

defaultdict significantly enhances readability, maintainability, and safety in situations where the pattern of dict + conditional statements is frequently used. Particularly in the following three situations, you find yourself thinking, 'This is definitely a case for using defaultdict!'

6-1. Aggregating Counts/Accumulations Without Conditions

from collections import defaultdict

# Regular dict
counts = {}
for item in items:
    if item not in counts:
        counts[item] = 0
    counts[item] += 1

# defaultdict
counts = defaultdict(int)
for item in items:
    counts[item] += 1

✔ The disappearance of conditional statements makes the code cleaner and reduces the chance of mistakes.
✔ It is particularly well-suited for processing large data volumes like log analysis and word counting.

6-2. Accumulating Lists/Sets as a Replacement for setdefault

from collections import defaultdict

# Regular dict
posts = {}
for tag, post in data:
    if tag not in posts:
        posts[tag] = []
    posts[tag].append(post)

# defaultdict
posts = defaultdict(list)
for tag, post in data:
    posts[tag].append(post)

✔ Much more intuitive than setdefault() and keeps the code clean even within loops.
✔ It’s optimized for data grouping.

6-3. Automating Initialization in Nested Dictionaries

# Regular dictionary
matrix = {}
if 'x' not in matrix:
    matrix['x'] = {}
matrix['x']['y'] = 10

# Nested defaultdict
matrix = defaultdict(lambda: defaultdict(int))  # Creates a defaultdict(int) automatically each time a key is missing
matrix['x']['y'] += 10

✔ You can easily create nested data structures, making it very advantageous for working with multidimensional dictionaries.
✔ It excels in applications such as data mining, parsing, and storing tree structures.

lambda: defaultdict(int) internally returns a defaultdict(int) each time a key is missing, allowing for automatic nesting of dictionaries.


7. Conclusion

collections.defaultdict may initially appear as a simple extension of dict to beginners, but the more you use it, the more you realize it is a structural tool that makes your code clearer and safer.

  • It allows you to use dictionaries without worrying about KeyError.
  • You can group and accumulate data without conditional statements.
  • It allows for intuitive composition of nested dictionaries.
# Example of handling data reliably and succinctly using defaultdict
salaries = defaultdict(int)
for dept, amount in records:
    salaries[dept] += amount

If you can improve readability and maintainability while preventing errors with just one line of code,
defaultdict is not merely a convenience feature but rather a key tool for Pythonic thinking.

The next topic will be pathlib.
It offers a modern way to handle file/directory operations in an object-oriented manner, making it useful for everyone from beginners to intermediates. Developers familiar with os.path will feel, "Wow! This is so much simpler than os.path!" Stay tuned for the next installment.