[Python Standard Library – 2] Data Storage & Serialization: JSON, Pickle, CSV

Data Storage & Serialization with Python’s Standard Library: json, pickle, csv

When you need to persist data to a file or transmit it over a network, serialization is essential. Python offers three distinct standard tools for this purpose: json, pickle, and csv.

Data Delivery Formats

json – a human‑readable text format that excels at cross‑language exchange.
pickle – a binary format that stores Python objects exactly as they are, powerful but potentially risky.
csv – a plain‑text format that captures tabular data in the simplest way, highly portable.

This article outlines how to choose between these modules based on what you’re trying to achieve.

1. JSON

1.1 Overview

JSON (JavaScript Object Notation) is text‑based and language‑agnostic. It’s widely used for configuration files, API responses/requests, logs, and any scenario where data needs to be exchanged between systems. In Python, the json module handles it directly.

1.2 Core API (most common)

json.dump(obj, fp, ...) / json.load(fp, ...) – write to / read from a file.
json.dumps(obj, ...) / json.loads(s, ...) – convert to / from a string.

Key options to remember:

ensure_ascii=False – keep non‑ASCII characters (e.g., Latin‑extended, Cyrillic, CJK) unescaped.
indent=2 – pretty‑print for readability.
default=... – a fallback for types that JSON doesn’t natively support.

1.3 Basic Usage Example

import json

data = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "Data Science"]
}

with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

with open("data.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)

print(loaded)

1.4 Handling JSON’s Type Restrictions

JSON naturally supports only dict, list, str, int/float, bool, and None. For types like datetime, you can use the default callback to convert them into JSON‑friendly forms.

import json
from datetime import datetime, timezone

payload = {"created_at": datetime.now(timezone.utc)}

def to_jsonable(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(f"Not JSON serializable: {type(obj)!r}")

s = json.dumps(payload, default=to_jsonable, ensure_ascii=False)
print(s)

1.5 Pros & Cons

Pros

Language‑agnostic – great for sharing and exchanging data.
Text‑based – easy to debug and version‑control.
No code execution on load, unlike pickle.

Cons

Limited type expressiveness (e.g., datetime, set, Decimal, binary data).
Larger size and slower for very large datasets.

2. Pickle

2.1 Overview

pickle serializes Python objects exactly as they are. The output is binary data (bytes) that, when loaded, recreates an almost identical object.

Pickle shines when:

You need to store complex Python objects (custom classes, nested structures, trained models, configuration objects, cached results).
You’re not exchanging data with non‑Python systems.

Avoid pickle when:

Loading data from untrusted sources (it can execute arbitrary code).
Interoperability with other languages is required.

2.2 Core API (most used)

pickle.dump(obj, file, protocol=...) – write to a binary file.
pickle.load(file) – read from a binary file.
pickle.dumps(obj, protocol=...) – convert to bytes.
pickle.loads(data) – convert bytes back to an object.

import pickle

data = {"a": [1, 2, 3], "b": ("x", "y")}

with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

with open("data.pkl", "rb") as f:
    loaded = pickle.load(f)

print(loaded)

2.3 What is `protocol` and why care?

protocol is the format version used by pickle. Different versions affect file size, speed, and supported features.

If unspecified, Python chooses the best default for the current environment.
Common reasons to set it explicitly: 1. Compatibility with very old Python versions. 2. Optimizing for size or speed by forcing the latest protocol.

import pickle

with open("data.pkl", "wb") as f:
    pickle.dump({"x": 1}, f, protocol=pickle.HIGHEST_PROTOCOL)

In most cases, just use the default and switch to pickle.HIGHEST_PROTOCOL only when needed.

2.4 Security Caveat (Important)

pickle.load() / pickle.loads() can execute code embedded in the pickle data. Therefore:

Never load pickles from untrusted sources.
Prefer text‑based formats like JSON for data exchange.

2.5 Pros & Cons

Pros

Stores virtually any Python object.
Often faster and more compact than JSON for complex objects.

Cons

Security risk with untrusted data.
Python‑only – no cross‑language compatibility.
Changes to class definitions can break old pickles.

3. CSV

3.1 Overview

CSV (Comma‑Separated Values) is the simplest format for tabular data. It’s common in spreadsheets, data export/import, and lightweight logging.

3.2 Core API (most used)

csv.reader, csv.writer – work with lists.
csv.DictReader, csv.DictWriter – work with dictionaries (usually more convenient).

3.3 Example with `DictWriter`/`DictReader`

import csv

data = [
    {"name": "Alice", "age": 30, "city": "Seoul"},
    {"name": "Bob",   "age": 25, "city": "Busan"},
]

with open("people.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "age", "city"])
    writer.writeheader()
    writer.writerows(data)

with open("people.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    loaded = list(reader)

print(loaded)

3.4 Three CSV Gotchas

Always use newline="" – especially on Windows to avoid double line breaks.
All values are strings – convert types manually after reading.
Delimiters, quotes, and newlines can appear in data – the csv module handles these edge cases.

3.5 Pros & Cons

Pros

Extremely lightweight and widely supported.
Simple to read and write for tabular data.

Cons

Cannot represent nested structures.
Lacks type information, requiring manual conversion.

4. Comparison & Decision Guide

Feature	JSON	Pickle	CSV
Language Compatibility	Excellent	Python‑only	Excellent
Readability	High	Low (binary)	High
Type Expressiveness	Limited	High	Limited
Safety	Relatively safe	Requires caution	Relatively safe
Data Shape	Tree (nested)	Python objects	Table (rows/columns)

Rule of thumb

Exchange with other systems → json.
Persist Python objects wholesale → pickle (but be careful with source).
Export/import tabular data → csv.

5. Storing the Same Data in Three Formats

Below is a quick demo that writes the same data to JSON, Pickle, and CSV, then reads it back.

import json, pickle, csv

data = [
    {"id": 1, "name": "Alice", "score": 95.5},
    {"id": 2, "name": "Bob",   "score": 88.0},
]

# 1) JSON
with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

with open("data.json", "r", encoding="utf-8") as f:
    json_loaded = json.load(f)

# 2) Pickle
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

with open("data.pkl", "rb") as f:
    pickle_loaded = pickle.load(f)

# 3) CSV
with open("data.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["id", "name", "score"])
    writer.writeheader()
    writer.writerows(data)

with open("data.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    csv_loaded = [
        {"id": int(row["id"]), "name": row["name"], "score": float(row["score"])}
        for row in reader
    ]

print(json_loaded)
print(pickle_loaded)
print(csv_loaded)

Takeaway

CSV requires explicit type conversion on read.
Pickle is convenient but requires caution regarding the data source.

6. Wrap‑Up

Python’s standard library alone covers a wide range of data storage and serialization needs.

Need cross‑language readability? → json.
Want to preserve Python objects exactly? → pickle.
Need simple tabular export? → csv.

Choosing the right format boils down to the shape of your data and the intended use case.

[Python Standard Library – 2] Data Storage & Serialization: JSON, Pickle, CSV

1. JSON

1.1 Overview

1.2 Core API (most common)

1.3 Basic Usage Example

1.4 Handling JSON’s Type Restrictions

1.5 Pros & Cons

2. Pickle

2.1 Overview

2.2 Core API (most used)

2.3 What is `protocol` and why care?

2.4 Security Caveat (Important)

2.5 Pros & Cons

3. CSV

3.1 Overview

3.2 Core API (most used)

3.3 Example with `DictWriter`/`DictReader`

3.4 Three CSV Gotchas

3.5 Pros & Cons

4. Comparison & Decision Guide

5. Storing the Same Data in Three Formats

6. Wrap‑Up

whitedec

Similar Posts

Understanding the Differences Between JSON Format and Python Dictionary

When Django Admin Search Gets Frustrating: Creating a Mixin to Search Specific Fields Only

[Python Standard Library - 5] Working with Numbers: Using math and statistics

Python Standard Library – 4: Mastering the random Module (Selection, Sampling, Shuffling, Reproducibility)

Leave a comment

Add a New Comment

1. JSON

1.1 Overview

1.2 Core API (most common)

1.3 Basic Usage Example

1.4 Handling JSON’s Type Restrictions

1.5 Pros & Cons

2. Pickle

2.1 Overview

2.2 Core API (most used)

2.3 What is protocol and why care?

2.4 Security Caveat (Important)

2.5 Pros & Cons

3. CSV

3.1 Overview

3.2 Core API (most used)

3.3 Example with DictWriter/DictReader

3.4 Three CSV Gotchas

3.5 Pros & Cons

4. Comparison & Decision Guide

5. Storing the Same Data in Three Formats

6. Wrap‑Up

whitedec

Similar Posts

Understanding the Differences Between JSON Format and Python Dictionary

When Django Admin Search Gets Frustrating: Creating a Mixin to Search Specific Fields Only

[Python Standard Library - 5] Working with Numbers: Using math and statistics

Python Standard Library – 4: Mastering the random Module (Selection, Sampling, Shuffling, Reproducibility)

Leave a comment

Add a New Comment

2.3 What is `protocol` and why care?

3.3 Example with `DictWriter`/`DictReader`