# Data Storage & Serialization with Python’s Standard Library: `json`, `pickle`, `csv`

When you need to persist data to a file or transmit it over a network, **serialization** is essential. Python offers three distinct standard tools for this purpose: **`json`**, **`pickle`**, and **`csv`**.

![Data Delivery Formats](/media/editor_temp/6/82b2d285-16b0-473a-b88a-cd9a6f6582f2.png)

* **`json`** – a human‑readable text format that excels at cross‑language exchange.
* **`pickle`** – a binary format that stores Python objects exactly as they are, powerful but potentially risky.
* **`csv`** – a plain‑text format that captures tabular data in the simplest way, highly portable.

This article outlines how to choose between these modules based on what you’re trying to achieve.

---

## 1. JSON {#sec-c0452ee3f8b9}

### 1.1 Overview {#sec-573894aef250}

JSON (JavaScript Object Notation) is **text‑based** and **language‑agnostic**. It’s widely used for configuration files, API responses/requests, logs, and any scenario where data needs to be exchanged between systems. In Python, the `json` module handles it directly.

### 1.2 Core API (most common) {#sec-7388e1478aff}

* `json.dump(obj, fp, ...)` / `json.load(fp, ...)` – write to / read from a file.
* `json.dumps(obj, ...)` / `json.loads(s, ...)` – convert to / from a string.

Key options to remember:

* `ensure_ascii=False` – keep non‑ASCII characters (e.g., Latin‑extended, Cyrillic, CJK) unescaped.
* `indent=2` – pretty‑print for readability.
* `default=...` – a fallback for types that JSON doesn’t natively support.

### 1.3 Basic Usage Example {#sec-57d0c0653337}

```python
import json

data = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "Data Science"]
}

with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

with open("data.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)

print(loaded)
```

### 1.4 Handling JSON’s Type Restrictions {#sec-a43d0fe8a199}

JSON naturally supports only `dict`, `list`, `str`, `int/float`, `bool`, and `None`. For types like `datetime`, you can use the `default` callback to convert them into JSON‑friendly forms.

```python
import json
from datetime import datetime, timezone

payload = {"created_at": datetime.now(timezone.utc)}

def to_jsonable(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(f"Not JSON serializable: {type(obj)!r}")

s = json.dumps(payload, default=to_jsonable, ensure_ascii=False)
print(s)
```

### 1.5 Pros & Cons {#sec-667d6fad184b}

**Pros**

* Language‑agnostic – great for sharing and exchanging data.
* Text‑based – easy to debug and version‑control.
* No code execution on load, unlike `pickle`.

**Cons**

* Limited type expressiveness (e.g., `datetime`, `set`, `Decimal`, binary data).
* Larger size and slower for very large datasets.

---

## 2. Pickle {#sec-7417b1955ccc}

### 2.1 Overview {#sec-1889607ea8cb}

`pickle` serializes **Python objects exactly as they are**. The output is binary data (bytes) that, when loaded, recreates an almost identical object.

Pickle shines when:

* You need to store complex Python objects (custom classes, nested structures, trained models, configuration objects, cached results).
* You’re not exchanging data with non‑Python systems.

Avoid pickle when:

* Loading data from untrusted sources (it can execute arbitrary code).
* Interoperability with other languages is required.

---

### 2.2 Core API (most used) {#sec-a0a97fe87a25}

* `pickle.dump(obj, file, protocol=...)` – write to a binary file.
* `pickle.load(file)` – read from a binary file.
* `pickle.dumps(obj, protocol=...)` – convert to bytes.
* `pickle.loads(data)` – convert bytes back to an object.

```python
import pickle

data = {"a": [1, 2, 3], "b": ("x", "y")}

with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

with open("data.pkl", "rb") as f:
    loaded = pickle.load(f)

print(loaded)
```

---

### 2.3 What is `protocol` and why care? {#sec-1f8e5565848c}

`protocol` is the **format version** used by pickle. Different versions affect file size, speed, and supported features.

* If unspecified, Python chooses the best default for the current environment.
* Common reasons to set it explicitly:
  1. Compatibility with very old Python versions.
  2. Optimizing for size or speed by forcing the latest protocol.

```python
import pickle

with open("data.pkl", "wb") as f:
    pickle.dump({"x": 1}, f, protocol=pickle.HIGHEST_PROTOCOL)
```

In most cases, just use the default and switch to `pickle.HIGHEST_PROTOCOL` only when needed.

---

### 2.4 Security Caveat (Important) {#sec-007b70a5e9ec}

`pickle.load()` / `pickle.loads()` can execute code embedded in the pickle data. Therefore:

* Never load pickles from untrusted sources.
* Prefer text‑based formats like JSON for data exchange.

---

### 2.5 Pros & Cons {#sec-bbd5a24bec31}

**Pros**

* Stores virtually any Python object.
* Often faster and more compact than JSON for complex objects.

**Cons**

* Security risk with untrusted data.
* Python‑only – no cross‑language compatibility.
* Changes to class definitions can break old pickles.

---

## 3. CSV {#sec-99e590b24e50}

### 3.1 Overview {#sec-1265cc44a461}

CSV (Comma‑Separated Values) is the simplest format for tabular data. It’s common in spreadsheets, data export/import, and lightweight logging.

### 3.2 Core API (most used) {#sec-5b2e4c37cce5}

* `csv.reader`, `csv.writer` – work with lists.
* `csv.DictReader`, `csv.DictWriter` – work with dictionaries (usually more convenient).

### 3.3 Example with `DictWriter`/`DictReader` {#sec-524921270351}

```python
import csv

data = [
    {"name": "Alice", "age": 30, "city": "Seoul"},
    {"name": "Bob",   "age": 25, "city": "Busan"},
]

with open("people.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "age", "city"])
    writer.writeheader()
    writer.writerows(data)

with open("people.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    loaded = list(reader)

print(loaded)
```

### 3.4 Three CSV Gotchas {#sec-3db1b7ed8444}

1. **Always use `newline=""`** – especially on Windows to avoid double line breaks.
2. **All values are strings** – convert types manually after reading.
3. **Delimiters, quotes, and newlines can appear in data** – the `csv` module handles these edge cases.

### 3.5 Pros & Cons {#sec-6a8ca4891a8e}

**Pros**

* Extremely lightweight and widely supported.
* Simple to read and write for tabular data.

**Cons**

* Cannot represent nested structures.
* Lacks type information, requiring manual conversion.

---

## 4. Comparison & Decision Guide {#sec-6304ec188cc0}

| Feature | JSON | Pickle | CSV |
|---------|------|--------|-----|
| Language Compatibility | Excellent | Python‑only | Excellent |
| Readability | High | Low (binary) | High |
| Type Expressiveness | Limited | High | Limited |
| Safety | Relatively safe | Requires caution | Relatively safe |
| Data Shape | Tree (nested) | Python objects | Table (rows/columns) |

**Rule of thumb**

* **Exchange with other systems** → `json`.
* **Persist Python objects wholesale** → `pickle` (but be careful with source).
* **Export/import tabular data** → `csv`.

---

## 5. Storing the Same Data in Three Formats {#sec-21f8c2a6d6c9}

Below is a quick demo that writes the same data to JSON, Pickle, and CSV, then reads it back.

```python
import json, pickle, csv

data = [
    {"id": 1, "name": "Alice", "score": 95.5},
    {"id": 2, "name": "Bob",   "score": 88.0},
]

# 1) JSON
with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

with open("data.json", "r", encoding="utf-8") as f:
    json_loaded = json.load(f)

# 2) Pickle
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

with open("data.pkl", "rb") as f:
    pickle_loaded = pickle.load(f)

# 3) CSV
with open("data.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["id", "name", "score"])
    writer.writeheader()
    writer.writerows(data)

with open("data.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    csv_loaded = [
        {"id": int(row["id"]), "name": row["name"], "score": float(row["score"])}
        for row in reader
    ]

print(json_loaded)
print(pickle_loaded)
print(csv_loaded)
```

**Takeaway**

* CSV requires explicit type conversion on read.
* Pickle is convenient but demands careful source control.

---

## 6. Wrap‑Up {#sec-f88db3766ed6}

Python’s standard library alone covers a wide range of data storage and serialization needs.

* **Need cross‑language readability?** → `json`.
* **Want to preserve Python objects exactly?** → `pickle`.
* **Need simple tabular export?** → `csv`.

Choosing the right format boils down to the shape of your data and the intended use case.