# Data Storage & Serialization with Python’s Standard Library: `json`, `pickle`, `csv` When you need to persist data to a file or transmit it over a network, **serialization** is essential. Python offers three distinct standard tools for this purpose: **`json`**, **`pickle`**, and **`csv`**. ![Data Delivery Formats](/media/editor_temp/6/82b2d285-16b0-473a-b88a-cd9a6f6582f2.png) * **`json`** – a human‑readable text format that excels at cross‑language exchange. * **`pickle`** – a binary format that stores Python objects exactly as they are, powerful but potentially risky. * **`csv`** – a plain‑text format that captures tabular data in the simplest way, highly portable. This article outlines how to choose between these modules based on what you’re trying to achieve. --- ## 1. JSON {#sec-c0452ee3f8b9} ### 1.1 Overview {#sec-573894aef250} JSON (JavaScript Object Notation) is **text‑based** and **language‑agnostic**. It’s widely used for configuration files, API responses/requests, logs, and any scenario where data needs to be exchanged between systems. In Python, the `json` module handles it directly. ### 1.2 Core API (most common) {#sec-7388e1478aff} * `json.dump(obj, fp, ...)` / `json.load(fp, ...)` – write to / read from a file. * `json.dumps(obj, ...)` / `json.loads(s, ...)` – convert to / from a string. Key options to remember: * `ensure_ascii=False` – keep non‑ASCII characters (e.g., Latin‑extended, Cyrillic, CJK) unescaped. * `indent=2` – pretty‑print for readability. * `default=...` – a fallback for types that JSON doesn’t natively support. ### 1.3 Basic Usage Example {#sec-57d0c0653337} ```python import json data = { "name": "Alice", "age": 30, "skills": ["Python", "Data Science"] } with open("data.json", "w", encoding="utf-8") as f: json.dump(data, f, ensure_ascii=False, indent=2) with open("data.json", "r", encoding="utf-8") as f: loaded = json.load(f) print(loaded) ``` ### 1.4 Handling JSON’s Type Restrictions {#sec-a43d0fe8a199} JSON naturally supports only `dict`, `list`, `str`, `int/float`, `bool`, and `None`. For types like `datetime`, you can use the `default` callback to convert them into JSON‑friendly forms. ```python import json from datetime import datetime, timezone payload = {"created_at": datetime.now(timezone.utc)} def to_jsonable(obj): if isinstance(obj, datetime): return obj.isoformat() raise TypeError(f"Not JSON serializable: {type(obj)!r}") s = json.dumps(payload, default=to_jsonable, ensure_ascii=False) print(s) ``` ### 1.5 Pros & Cons {#sec-667d6fad184b} **Pros** * Language‑agnostic – great for sharing and exchanging data. * Text‑based – easy to debug and version‑control. * No code execution on load, unlike `pickle`. **Cons** * Limited type expressiveness (e.g., `datetime`, `set`, `Decimal`, binary data). * Larger size and slower for very large datasets. --- ## 2. Pickle {#sec-7417b1955ccc} ### 2.1 Overview {#sec-1889607ea8cb} `pickle` serializes **Python objects exactly as they are**. The output is binary data (bytes) that, when loaded, recreates an almost identical object. Pickle shines when: * You need to store complex Python objects (custom classes, nested structures, trained models, configuration objects, cached results). * You’re not exchanging data with non‑Python systems. Avoid pickle when: * Loading data from untrusted sources (it can execute arbitrary code). * Interoperability with other languages is required. --- ### 2.2 Core API (most used) {#sec-a0a97fe87a25} * `pickle.dump(obj, file, protocol=...)` – write to a binary file. * `pickle.load(file)` – read from a binary file. * `pickle.dumps(obj, protocol=...)` – convert to bytes. * `pickle.loads(data)` – convert bytes back to an object. ```python import pickle data = {"a": [1, 2, 3], "b": ("x", "y")} with open("data.pkl", "wb") as f: pickle.dump(data, f) with open("data.pkl", "rb") as f: loaded = pickle.load(f) print(loaded) ``` --- ### 2.3 What is `protocol` and why care? {#sec-1f8e5565848c} `protocol` is the **format version** used by pickle. Different versions affect file size, speed, and supported features. * If unspecified, Python chooses the best default for the current environment. * Common reasons to set it explicitly: 1. Compatibility with very old Python versions. 2. Optimizing for size or speed by forcing the latest protocol. ```python import pickle with open("data.pkl", "wb") as f: pickle.dump({"x": 1}, f, protocol=pickle.HIGHEST_PROTOCOL) ``` In most cases, just use the default and switch to `pickle.HIGHEST_PROTOCOL` only when needed. --- ### 2.4 Security Caveat (Important) {#sec-007b70a5e9ec} `pickle.load()` / `pickle.loads()` can execute code embedded in the pickle data. Therefore: * Never load pickles from untrusted sources. * Prefer text‑based formats like JSON for data exchange. --- ### 2.5 Pros & Cons {#sec-bbd5a24bec31} **Pros** * Stores virtually any Python object. * Often faster and more compact than JSON for complex objects. **Cons** * Security risk with untrusted data. * Python‑only – no cross‑language compatibility. * Changes to class definitions can break old pickles. --- ## 3. CSV {#sec-99e590b24e50} ### 3.1 Overview {#sec-1265cc44a461} CSV (Comma‑Separated Values) is the simplest format for tabular data. It’s common in spreadsheets, data export/import, and lightweight logging. ### 3.2 Core API (most used) {#sec-5b2e4c37cce5} * `csv.reader`, `csv.writer` – work with lists. * `csv.DictReader`, `csv.DictWriter` – work with dictionaries (usually more convenient). ### 3.3 Example with `DictWriter`/`DictReader` {#sec-524921270351} ```python import csv data = [ {"name": "Alice", "age": 30, "city": "Seoul"}, {"name": "Bob", "age": 25, "city": "Busan"}, ] with open("people.csv", "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=["name", "age", "city"]) writer.writeheader() writer.writerows(data) with open("people.csv", "r", encoding="utf-8") as f: reader = csv.DictReader(f) loaded = list(reader) print(loaded) ``` ### 3.4 Three CSV Gotchas {#sec-3db1b7ed8444} 1. **Always use `newline=""`** – especially on Windows to avoid double line breaks. 2. **All values are strings** – convert types manually after reading. 3. **Delimiters, quotes, and newlines can appear in data** – the `csv` module handles these edge cases. ### 3.5 Pros & Cons {#sec-6a8ca4891a8e} **Pros** * Extremely lightweight and widely supported. * Simple to read and write for tabular data. **Cons** * Cannot represent nested structures. * Lacks type information, requiring manual conversion. --- ## 4. Comparison & Decision Guide {#sec-6304ec188cc0} | Feature | JSON | Pickle | CSV | |---------|------|--------|-----| | Language Compatibility | Excellent | Python‑only | Excellent | | Readability | High | Low (binary) | High | | Type Expressiveness | Limited | High | Limited | | Safety | Relatively safe | Requires caution | Relatively safe | | Data Shape | Tree (nested) | Python objects | Table (rows/columns) | **Rule of thumb** * **Exchange with other systems** → `json`. * **Persist Python objects wholesale** → `pickle` (but be careful with source). * **Export/import tabular data** → `csv`. --- ## 5. Storing the Same Data in Three Formats {#sec-21f8c2a6d6c9} Below is a quick demo that writes the same data to JSON, Pickle, and CSV, then reads it back. ```python import json, pickle, csv data = [ {"id": 1, "name": "Alice", "score": 95.5}, {"id": 2, "name": "Bob", "score": 88.0}, ] # 1) JSON with open("data.json", "w", encoding="utf-8") as f: json.dump(data, f, ensure_ascii=False, indent=2) with open("data.json", "r", encoding="utf-8") as f: json_loaded = json.load(f) # 2) Pickle with open("data.pkl", "wb") as f: pickle.dump(data, f) with open("data.pkl", "rb") as f: pickle_loaded = pickle.load(f) # 3) CSV with open("data.csv", "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=["id", "name", "score"]) writer.writeheader() writer.writerows(data) with open("data.csv", "r", encoding="utf-8") as f: reader = csv.DictReader(f) csv_loaded = [ {"id": int(row["id"]), "name": row["name"], "score": float(row["score"])} for row in reader ] print(json_loaded) print(pickle_loaded) print(csv_loaded) ``` **Takeaway** * CSV requires explicit type conversion on read. * Pickle is convenient but demands careful source control. --- ## 6. Wrap‑Up {#sec-f88db3766ed6} Python’s standard library alone covers a wide range of data storage and serialization needs. * **Need cross‑language readability?** → `json`. * **Want to preserve Python objects exactly?** → `pickle`. * **Need simple tabular export?** → `csv`. Choosing the right format boils down to the shape of your data and the intended use case.