Did you know that Python has thousands of external libraries, but some of the most powerful tools are already included in the standard library?
This series, themed around ‘Deep Dive into the Python Standard Library’, will explore standard libraries that are widely used but not often discussed in depth, one by one.
The goal is not just to list simple functions, but to understand concepts through practical examples and enhance your coding skills to elevate your Python usage to the next level.
In-depth Usage of collections: From Basics to Practical Applications
1. Why start with collections
?
collections
provides efficient, high-level collections that complement Python's built-in data types (list
, dict
, tuple
) in terms of performance and structure. These collections often appear in practical scenarios but are rarely discussed in depth.
In this article, I will focus on the five most practical classes and explain ‘why to use them’, ‘how to use them’, and ‘when they are beneficial’.
2. Counter
– The Definitive Tool for Counting Frequencies and Beyond
Basic Concept
collections.Counter
is one of the very useful classes included in the Python standard library collections
module. As the name implies, it's a "counter", a special dictionary optimized for counting occurrences (frequencies) of data.
from collections import Counter
c = Counter(['a', 'b', 'c', 'a', 'b', 'a'])
print(c) # Counter({'a': 3, 'b': 2, 'c': 1})
It’s a data structure that counts how many times each element appears when you input iterable objects such as lists, strings, tuples, or dictionaries.
Main Features and Methods
📌 Various Initialization Methods
Counter
can be initialized in various ways, allowing flexible data analysis.
from collections import Counter
print(Counter(['a', 'b', 'a']))
# Counter({'a': 2, 'b': 1}) → List
print(Counter({'a': 2, 'b': 1}))
# Counter({'a': 2, 'b': 1}) → Dictionary
print(Counter(a=2, b=1))
# Counter({'a': 2, 'b': 1}) → Keyword arguments
📌 Element Access
Counter
behaves like a dict
, but when accessing a non-existent key, it returns 0
instead of raising a KeyError
.
c = Counter('hello')
print(c['l']) # 2 (The character 'l' appears twice)
print(c['x']) # 0 ('x' does not appear, but returns 0 instead of an error)
📌 Adding/Modifying Elements
You can add to existing elements or modify them directly. Non-existent keys are automatically added.
c = Counter('hello')
c['l'] += 3
print(c)
# Counter({'l': 5, 'o': 1, 'h': 1, 'e': 1})
📌 most_common(n)
– Extracting Most Frequent Elements
Returns a list of tuples of the n
most common elements in order of frequency.
c = Counter('banana')
print(c.most_common(2))
# [('a', 3), ('n', 2)] → 'a' appears 3 times, 'n' appears 2 times
📌 elements()
– An Iterator for Iterating Elements
Provides an iterator that repeats elements based on their counts.
c = Counter('banana')
print(list(c.elements()))
# ['b', 'a', 'a', 'a', 'n', 'n']
However, elements with values less than or equal to 0 are excluded.
📌 Support for Mathematical Operations (Counter operations using +, -, &, |)
One of the powerful points of Counter
is that it supports arithmetic and set operations.
c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)
print(c1 + c2)
# Counter({'a': 4, 'b': 3}) → Same keys are combined
print(c1 - c2)
# Counter({'a': 2}) → Negative values are ignored, 'b' is omitted as it would be negative
print(c1 & c2)
# Counter({'a': 1, 'b': 1}) → Intersection, based on minimum value
print(c1 | c2)
# Counter({'a': 3, 'b': 2}) → Union, based on maximum value
Practical Examples
📌 Analyzing Word Frequencies in Strings
text = "the quick brown fox jumps over the lazy dog"
counter = Counter(text.split())
print(counter)
📌 Log Frequency Analysis
logs = ['INFO', 'ERROR', 'INFO', 'DEBUG', 'ERROR', 'ERROR']
print(Counter(logs)) # Counter({'ERROR': 3, 'INFO': 2, 'DEBUG': 1})
📌 Counting Duplicate Elements in a List
nums = [1, 2, 2, 3, 3, 3]
print(Counter(nums)) # Counter({3: 3, 2: 2, 1: 1})
Important Points to Note
Counter
inherits fromdict
, but does not guarantee order. If order is important, usemost_common()
.- Items are not removed even if their counts drop to 0 or below, so you may need to filter them manually.
c = Counter(a=3)
c.subtract({'a': 5})
print(c) # Counter({'a': -2}) # Note that items do not disappear even if their values drop to 0 or below
Tip: Accumulating Without Initialization
counter = Counter()
with open("data.txt") as f:
for line in f:
counter.update(line.strip().split())
Conclusion
collections.Counter
is a powerful tool that is almost indispensable in data analysis, log processing, and text mining. It serves as an easy frequency counting tool for beginners, while also evolving into an advanced processing tool that combines operations and filtering for experts.
Next Episode Preview
defaultdict
– A world without KeyErrors, more flexible than dict! Stay tuned for the next episode!
By ‘thoroughly understanding and properly using’ the standard library, the quality of your code will definitely improve.
Add a New Comment