defaultdict
: The Evolution of a Condition-less Dictionary
Python boasts numerous external libraries, but a solid understanding of just the standard library can enable you to write sufficiently powerful code in practice. In this post, we'll take an in-depth look at collections.defaultdict
.
Through this article, you'll not only be introduced to the concept but also gain a clear understanding of when, why, and how to use defaultdict
.
If you're curious about the first in the collections series, the Counter class, I recommend reading the previous article. Mastering the Python Standard Library ① - collections.Counter
1. Basic Concept: What is defaultdict
?
defaultdict
is a special subclass of dictionaries (dict) included in the Python standard library's collections
module. Whereas accessing a key that doesn't exist in a regular dictionary raises a KeyError
, defaultdict
allows you to specify a factory function that automatically generates a default value, making your code cleaner and preventing errors.
2. Basic Usage
from collections import defaultdict
d = defaultdict(int)
d['apple'] += 1
print(d) # defaultdict(<class 'int'>, {'apple': 1})
Here, int()
returns 0
as a default value. When accessing the non-existent key 'apple'
, 0
is automatically generated without raising a KeyError
, and then +1
is applied.
3. Examples of Various Default Values
from collections import defaultdict
# Default value: 0 (int)
counter = defaultdict(int)
counter['a'] += 1
print(counter) # defaultdict(<class 'int'>, {'a': 1})
# Default value: empty list
group = defaultdict(list)
group['fruit'].append('apple')
group['fruit'].append('banana')
print(group) # defaultdict(<class 'list'>, {'fruit': ['apple', 'banana']})
# Default value: empty set
unique_tags = defaultdict(set)
unique_tags['tags'].add('python')
unique_tags['tags'].add('coding')
print(unique_tags) # defaultdict(<class 'set'>, {'tags': {'python', 'coding'}})
# Default value: custom initial value
fixed = defaultdict(lambda: 100)
print(fixed['unknown']) # 100
4. Practical Examples
1. Counting Word Frequency
words = ['apple', 'banana', 'apple', 'orange', 'banana']
counter = defaultdict(int)
for word in words:
counter[word] += 1
print(counter)
# defaultdict(<class 'int'>, {'apple': 2, 'banana': 2, 'orange': 1})
👉 Counter vs defaultdict
Counting word frequency is better suited to collections.Counter()
, where statistical or ranking analysis is required. However, for simple accumulation like counting, defaultdict(int)
can also be used sufficiently concisely.
2. Organizing Logs by Group
logs = [
('2024-01-01', 'INFO'),
('2024-01-01', 'ERROR'),
('2024-01-02', 'DEBUG'),
]
grouped = defaultdict(list)
for date, level in logs:
grouped[date].append(level)
print(grouped)
# defaultdict(<class 'list'>, {'2024-01-01': ['INFO', 'ERROR'], '2024-01-02': ['DEBUG']})
3. Organizing Unique Tags
entries = [
('post1', 'python'),
('post1', 'coding'),
('post1', 'python'),
]
tags = defaultdict(set)
for post, tag in entries:
tags[post].add(tag)
print(tags)
# defaultdict(<class 'set'>, {'post1': {'python', 'coding'}})
5. Cautions
defaultdict
stores a default value generator internally, so itsrepr()
may differ from that of a regulardict
.- This could cause problems during JSON serialization. It's safer to convert using
dict(d)
. - Default values are generated only when accessed via
[]
. They will not be generated if accessed viaget()
.
from collections import defaultdict
d = defaultdict(list)
print(d.get('missing')) # None
print(d['missing']) # []
6. When is it Useful? – Three Unique Advantages of defaultdict
defaultdict
significantly enhances readability, maintainability, and safety in situations where the pattern of dict
+ conditional statements
is frequently used. Particularly in the following three situations, you find yourself thinking, 'This is definitely a case for using defaultdict!'
6-1. Aggregating Counts/Accumulations Without Conditions
from collections import defaultdict
# Regular dict
counts = {}
for item in items:
if item not in counts:
counts[item] = 0
counts[item] += 1
# defaultdict
counts = defaultdict(int)
for item in items:
counts[item] += 1
✔ The disappearance of conditional statements makes the code cleaner and reduces the chance of mistakes.
✔ It is particularly well-suited for processing large data volumes like log analysis and word counting.
6-2. Accumulating Lists/Sets as a Replacement for setdefault
from collections import defaultdict
# Regular dict
posts = {}
for tag, post in data:
if tag not in posts:
posts[tag] = []
posts[tag].append(post)
# defaultdict
posts = defaultdict(list)
for tag, post in data:
posts[tag].append(post)
✔ Much more intuitive than setdefault()
and keeps the code clean even within loops.
✔ It’s optimized for data grouping.
6-3. Automating Initialization in Nested Dictionaries
# Regular dictionary
matrix = {}
if 'x' not in matrix:
matrix['x'] = {}
matrix['x']['y'] = 10
# Nested defaultdict
matrix = defaultdict(lambda: defaultdict(int)) # Creates a defaultdict(int) automatically each time a key is missing
matrix['x']['y'] += 10
✔ You can easily create nested data structures, making it very advantageous for working with multidimensional dictionaries.
✔ It excels in applications such as data mining, parsing, and storing tree structures.
lambda: defaultdict(int)
internally returns adefaultdict(int)
each time a key is missing, allowing for automatic nesting of dictionaries.
7. Conclusion
collections.defaultdict
may initially appear as a simple extension of dict
to beginners, but the more you use it, the more you realize it is a structural tool that makes your code clearer and safer.
- It allows you to use dictionaries without worrying about
KeyError
. - You can group and accumulate data without conditional statements.
- It allows for intuitive composition of nested dictionaries.
# Example of handling data reliably and succinctly using defaultdict
salaries = defaultdict(int)
for dept, amount in records:
salaries[dept] += amount
If you can improve readability and maintainability while preventing errors with just one line of code,
defaultdict
is not merely a convenience feature but rather a key tool for Pythonic thinking.
The next topic will be pathlib
.
It offers a modern way to handle file/directory operations in an object-oriented manner, making it useful for everyone from beginners to intermediates. Developers familiar with os.path
will feel, "Wow! This is so much simpler than os.path!" Stay tuned for the next installment.
There are no comments.