The New Standard for Storing AI Models: Concepts and Advantages

As you browse AI model hubs like Hugging Face, you may notice an increasing number of files with the .safetensors extension, rather than the familiar .bin or .pth files.

In this post, we will explore in detail what .safetensors is, why it emerged, and what powerful advantages it has over existing methods from a technical perspective.

1. What is .safetensors?

.safetensors is a new Tensor storage format developed by Hugging Face.

Deep learning models are composed of billions of parameters (weights), and it serves the purpose of saving and loading these massive bundles of numbers as files. It was created to address the critical drawbacks (security and speed) of the previously standard way of storing data based on Python's pickle module.

In simple terms, it is a "safer and faster model storage file".

2. Why did it emerge: Issues with the existing method (Pickle)

Existing PyTorch models (.bin, .pth) internally use Python's pickle module to serialize data. However, pickle has a critical issue.

Security Vulnerability (Arbitrary Code Execution)

pickle does not merely store data; it saves the Python object itself. This process can include Python code, which allows malicious hackers to embed code that can compromise the system or steal personal information within the model file. The moment the user unknowingly calls load(), that malicious code gets executed.

Example of a problem scenario:

The user loads a model.bin downloaded from the internet -> Hidden code inside the file gets executed -> User's SSH keys or password gets sent to the hacker's server.

.safetensors was introduced to fundamentally eliminate such security threats.

3. Key Features of .safetensors

3.1. Safety

.safetensors is safe as the name implies. This format only stores pure tensor data and metadata in JSON format. There is no space for executable code, so you can safely load files downloaded from untrusted sources.

3.2. Zero-Copy and Speed

Loading large LLMs (Large Language Models) or Stable Diffusion models is dramatically faster.

Existing method: Copy the file to CPU memory -> Unpickling -> Convert back to tensor form -> Move to GPU. (Unnecessary copying processes occur)
safetensors: Utilizes Memory Mapping (mmap) technology. The operating system directly maps the file to memory addresses, allowing data to be immediately used from disk without unnecessary copying. This is called Zero-Copy.

3.3. Lazy Loading

It allows you to quickly read only the necessary parts without loading the entire model into memory.

For instance, if you want to check weights of a specific layer from a 100GB model file, the existing method requires reading all 100GB, while .safetensors can target and read just that part. This is very advantageous in distributed learning environments or during inference optimization.

3.4. Framework Compatibility

It is not dependent on specific deep learning frameworks (like PyTorch).

PyTorch
TensorFlow
JAX
PaddlePaddle

It is designed to be easily readable and writable across various frameworks like the ones mentioned above.

4. File Structure

.safetensors files have a very simple structure.

Header: Located at the beginning of the file and follows JSON format. It contains each tensor's name, data type (dtype), shape, and offset where the data is stored.
Data: A block of binary data that follows the header. It is densely filled with pure tensor values.

Thanks to this structure, it is possible to understand the model's structure by reading only the header without reading the entire file.

4-1. Practical Tip: Check Header and Metadata from Terminal

Loading several gigabytes of a downloaded .safetensors file to check if it's a quantized model or what layers it contains is inefficient.

The safetensors library provides a feature that allows you to quickly scan header information without reading the entire file. You can check it instantly by entering the following Python One-liner in your terminal.

Of course, you can also check it by clicking on file info on the top right of the page where you downloaded the model from hugging face. However, I'll explain how to check directly from my terminal here.

Preparation

First, the library needs to be installed.

pip install safetensors

Command (Terminal Input)

Replace model.safetensors with the actual file path.

python -c "from safetensors import safe_open; \
with safe_open('model.safetensors', framework='pt', device='cpu') as f: \
    print('--- Metadata ---'); \
    print(f.metadata()); \
    print('\n--- Tensor Keys (Layers) ---'); \
    print(list(f.keys())[:5])" # Only output the top 5 for brevity

Interpreting Output Results

By executing this command, you can obtain two important pieces of information.

Metadata: Information embedded by the model creator. If it contains details like format: gptq or quantization: int4, you can infer it is a quantized model even if not explicitly mentioned in the file name. (However, if the creator left the metadata empty, None might appear.)
Keys: Names of the layers that make up the model. This helps you understand the model's structure.

5. Comparison Summary: .bin (Pickle) vs .safetensors

Feature	.bin / .pth (Based on Pickle)	.safetensors
Security	Risk (Possible execution of malicious code)	Safe (Only data stored)
Loading Speed	Slow (CPU load)	Very fast (Zero-Copy)
Memory Efficiency	Requires full load	Loads only what's necessary (Lazy Loading)
Compatibility	Python/PyTorch dependent	Framework independent

6. Usage Example (Python)

Here is a simple example of using the safetensors library to save and load tensors.

import torch
from safetensors.torch import save_file, load_file

# 1. Create tensor and save
tensors = {
    "embedding": torch.zeros((1024, 512)),
    "attention": torch.rand((512, 512))
}

# Save the dictionary-shaped tensor to a file
save_file(tensors, "model.safetensors")

# 2. Load file
loaded = load_file("model.safetensors")
print(loaded["embedding"].shape) 
# Output: torch.Size([1024, 512])

7. Conclusion

.safetensors is not just a simple change of file extension; it is a necessary evolution for the security and efficiency of AI models. Major communities, including Hugging Face, have already established it as a standard format.

In the future, it is recommended to use .safetensors instead of .bin whenever downloading or distributing models. This will help ensure security against threats and significantly reduce model loading times.

An image symbolizing the advantages of the safetensor format