NumPy ndarray Basics for Deep Learning Beginners: array, dtype, shape, reshape, astype

1. Why Start with ndarray?



When you dive into deep learning, you’ll repeatedly see code like this:

  • Inspecting the shape of an input tensor
  • Using reshape to prepare batches
  • Converting data to float32 for GPU computation

All of these operations are built on the foundation of NumPy’s ndarray.

  • PyTorch’s Tensor is essentially a structure that closely mirrors NumPy’s ndarray.
  • Inputs, weights, and outputs in deep learning models are all multidimensional arrays (tensors).

Therefore, mastering ndarray is equivalent to learning the core syntax of tensor operations.


2. What Is an ndarray?

ndarray stands for N‑dimensional array—a concise way to refer to an array with N dimensions.

  • 1‑D: vector
  • 2‑D: matrix
  • 3‑D and above: tensor (image batches, time series, video, etc.)

A quick example:

import numpy as np

x = np.array([1, 2, 3])            # 1‑D (vector)
M = np.array([[1, 2], [3, 4]])     # 2‑D (matrix)

print(type(x))          # <class 'numpy.ndarray'>
print(x.ndim, x.shape)  # number of dimensions, shape
print(M.ndim, M.shape)
  • ndim: how many dimensions
  • shape: the size of each dimension

3. How Similar Is a PyTorch Tensor?



A PyTorch tensor is ultimately just a multidimensional array.

import torch

x_np = np.array([[1, 2], [3, 4]])   # NumPy ndarray
x_torch = torch.tensor([[1, 2], [3, 4]])  # PyTorch Tensor

print(type(x_np))      # numpy.ndarray
print(type(x_torch))   # torch.Tensor

print(x_np.shape)      # (2, 2)
print(x_torch.shape)   # torch.Size([2, 2])

Commonalities:

  • Both are multidimensional numeric arrays.
  • Concepts like shape, reshape, and dtype are almost identical.
  • Operations (+, *, @, etc.) behave similarly.

Differences that matter in deep learning:

  • NumPy runs on the CPU and has no automatic differentiation.
  • PyTorch tensors can use the GPU and support autograd.

Typical workflow:

  • Concept practice / data manipulation → NumPy
  • Actual model training → PyTorch

The more comfortable you are with ndarray, the more natural PyTorch tensor operations feel.


4. np.array: The Basic Way to Create an ndarray

The most fundamental constructor for an ndarray is np.array.

4.1 From Python Lists to ndarray

import numpy as np

# 1‑D array (vector)
x = np.array([1, 2, 3])
print(x)
print(x.ndim)   # 1
print(x.shape)  # (3,)

# 2‑D array (matrix)
M = np.array([[1, 2, 3],
              [4, 5, 6]])
print(M)
print(M.ndim)   # 2
print(M.shape)  # (2, 3)
  • A Python list (or list of lists) becomes an ndarray when passed to np.array.
  • The familiar batch_size x feature_dim matrix in deep learning is just this structure.

4.2 Quickly Creating Initialized Arrays

For experiments or training examples, you often need arrays filled with zeros or random values.

zeros = np.zeros((2, 3))       # 2x3 matrix, all zeros
ones = np.ones((2, 3))         # 2x3 matrix, all ones
randn = np.random.randn(2, 3)  # Gaussian random numbers

print(zeros.shape)  # (2, 3)

The same pattern applies in PyTorch:

import torch

zeros_t = torch.zeros((2, 3))
ones_t = torch.ones((2, 3))
randn_t = torch.randn((2, 3))

5. dtype: Understanding the Data Type

dtype represents the data type of the numbers stored in the array.

Common values:

  • int32, int64: integers
  • float32, float64: floating‑point numbers

Let’s check them:

x = np.array([1, 2, 3])
print(x.dtype)  # usually int64 or int32

y = np.array([1.0, 2.0, 3.0])
print(y.dtype)  # usually float64

5.1 Specifying dtype When Creating an Array

x = np.array([1, 2, 3], dtype=np.float32)
print(x.dtype)  # float32

In deep learning, float32 (PyTorch’s torch.float32) is the default because it balances GPU performance and memory usage.


6. shape: Reading the Data’s “Form"

shape is a tuple that describes the size of each dimension.

import numpy as np

x = np.array([1, 2, 3])
print(x.shape)  # (3,)

M = np.array([[1, 2, 3],
              [4, 5, 6]])
print(M.shape)  # (2, 3)

Typical shapes in deep learning:

  • A single feature vector: (feature_dim,) → e.g., (3,)
  • A batch of data: (batch_size, feature_dim) → e.g., (32, 3)
  • An image batch (PyTorch default): (batch_size, channels, height, width) → e.g., (32, 3, 224, 224)

Getting comfortable with these shapes in NumPy makes it easier to debug shape errors in PyTorch.


7. reshape: Changing the Form

reshape changes the shape of an array while keeping the total number of elements the same.

import numpy as np

x = np.array([1, 2, 3, 4, 5, 6])
print(x.shape)  # (6,)

M = x.reshape(2, 3)
print(M)
print(M.shape)  # (2, 3)

Key point:

  • The total number of elements before and after reshape must match.

7.1 Using -1 for Automatic Inference

In batching or image processing, -1 is handy. It tells NumPy to infer that dimension.

x = np.array([[1, 2, 3],
              [4, 5, 6]])  # shape: (2, 3)

# Flatten to 1‑D
flat = x.reshape(-1)        # shape: (6,)
print(flat)

# Reshape back to 2 rows, columns inferred
M = flat.reshape(2, -1)     # shape: (2, 3)
print(M)

PyTorch behaves similarly:

import torch

x_t = torch.tensor([[1, 2, 3],
                    [4, 5, 6]])  # (2, 3)

flat_t = x_t.reshape(-1)        # (6,)
M_t = flat_t.reshape(2, -1)     # (2, 3)

Once you’re comfortable with reshape, you can:

  • Flatten feature maps in CNNs
  • Arrange RNN/LSTM inputs as (batch, seq_len, feature)
  • Move batch dimensions around

8. astype: Changing the Data Type

astype converts an array’s data type.

import numpy as np

x = np.array([1, 2, 3])      # integer
print(x.dtype)               # int32 or int64

x_float = x.astype(np.float32)
print(x_float)
print(x_float.dtype)         # float32

Common deep‑learning scenarios:

  • Convert integer labels to floats for loss calculation.
  • Standardize data from float64 to float32.
  • Ensure type compatibility before passing to PyTorch.

Example:

import torch
import numpy as np

x = np.array([1, 2, 3], dtype=np.int32)
x = x.astype(np.float32)              # convert to float32
x_torch = torch.from_numpy(x)         # convert to tensor
print(x_torch.dtype)                  # torch.float32

Mismatched types can trigger errors like “Expected Float but got Double” in PyTorch.


9. Summary: The ndarray Basics Covered Today

What we’ve covered:

  1. What is an ndarray? – The core data structure for all deep‑learning data.
  2. Relationship to PyTorch Tensor – Conceptually the same, but with GPU and autograd support.
  3. np.array – Create arrays from Python lists.
  4. dtype – Specify numeric types (int, float, 32/64‑bit).
  5. shape – Understand the dimensionality of your data.
  6. reshape – Re‑shape arrays while keeping element count constant.
  7. astype – Convert between numeric types.

Mastering these four concepts (array, dtype, shape, reshape, astype) equips you to:

  • Handle tensor shape errors confidently.
  • Bridge the gap between research papers and code.
  • Follow PyTorch tutorials with ease.

image