Chapter 2.2 PyTorch Dimension Transformation Operations

Author

Brench

Published

2026-05-11

Modified

2026-05-11

2.2.1 Reshaping

Core functions: view, reshape, flatten

2.2.1.1 `torch.view` / `torch.reshape`

Academic definition: Reassign the shape of a tensor while keeping the total number of elements unchanged. view requires contiguous memory, while reshape can create a copy when that is necessary.
Intuitive explanation: Think of the same piece of clay being molded into another form: nothing is added or removed, but the outline changes.
Simple example (syntax): Arrange a 12-element vector into a matrix.

x = torch.arange(12) # shape: (12)
y = x.view(3, 4)     # shape: (3, 4)

Advanced example (LLM Multi-Head Attention): Scenario: In a Transformer, the hidden dimension is commonly divided across multiple attention heads. Suppose batch_size=32, seq_len=128, and hidden_dim=768, which corresponds to 12 heads with 64 dimensions each.

# Input: [Batch, Seq_Len, Hidden_Dim]
x = torch.randn(32, 128, 768)

# Output: [Batch, Seq_Len, Num_Heads, Head_Dim]
# Use view to split the last dimension
x_heads = x.view(32, 128, 12, 64)

2.2.1.2 `torch.flatten`

Academic definition: Merge a selected range of dimensions into a single dimension.
Intuitive explanation: Compress several layers into one flat surface while keeping the underlying content.
Simple example (syntax):

x = torch.randn(2, 3, 4)
# Flatten the last two dims: (2, 3*4) -> (2, 12)
y = x.flatten(start_dim = 1)

Advanced example (CNN/RL policy network): Scenario: In reinforcement learning with image observations, feature maps from a CNN are usually flattened before entering a fully connected layer.

# Input: [Batch, Channels, Height, Width] -> [32, 64, 7, 7]
features = torch.randn(32, 64, 7, 7)

# Flatten relevant dims for Linear Layer input
# Output: [Batch, Features] -> [32, 64*7*7] -> [32, 3136]
flat_features = features.flatten(start_dim = 1)

2.2.2 Swapping

Core functions: permute, transpose

2.2.2.1 `torch.permute`

Academic definition: Reorder every dimension according to the index sequence you provide.
Intuitive explanation: It is like assigning new seats to all dimensions: each axis moves to the position you specify.
Simple example (syntax):

x = torch.randn(2, 3, 4)
# Change to (4, 2, 3): original dim 2 first, dim 0 second, dim 1 last
y = x.permute(2, 0, 1)

Advanced example (Transformer dimension adjustment): Scenario: In multi-head attention, Num_Heads is often placed before Seq_Len so matrix multiplication can run in parallel over heads.

# Continuing from the view example: [Batch, Seq_Len, Num_Heads, Head_Dim]
x_heads = torch.randn(32, 128, 12, 64)

# Target: [Batch, Num_Heads, Seq_Len, Head_Dim]
# Swap dimension 1 and dimension 2
query = x_heads.permute(0, 2, 1, 3)

2.2.2.2 `torch.transpose`

Academic definition: Exchange exactly two selected dimensions.
Intuitive explanation: It is less general than permute, but cleaner when only one pair of axes needs to change places.
Advanced example (preparing matrix multiplication): Scenario: When computing attention scores, the Key tensor usually needs its last two dimensions swapped to align with Query.

# Key: [Batch, Heads, Seq_Len, Head_Dim]
K = torch.randn(32, 12, 128, 64)

# Transpose last two dims for matmul: [Batch, Heads, Head_Dim, Seq_Len]
K_t = K.transpose(-1, -2)

2.2.3 Adding and Removing Dimensions

Core functions: unsqueeze, squeeze

Simple example: unsqueeze inserts a size-1 dimension, while squeeze removes dimensions of size 1.
Advanced example (broadcasting): Scenario: When applying an attention mask in a large model, the mask is often 2D even though it must be added to a 4D attention score tensor.

# Scores: [Batch, Heads, Seq_Len, Seq_Len]
scores = torch.randn(32, 12, 128, 128)

# Mask: [Batch, Seq_Len] (for example, a padding mask)
mask = torch.ones(32, 128)

# Expand dimensions so the mask can be added to scores: [Batch, 1, 1, Seq_Len]
# Insert singleton dimensions at dim 1 and dim 2
mask_expanded = mask.unsqueeze(1).unsqueeze(2)

# Now mask_expanded can be added to scores through broadcasting

2.2.4 Combining

Core functions: cat (concatenate), stack

2.2.4.1 `torch.cat`

Academic definition: Join tensors along a dimension that already exists.
Intuitive explanation: Like connecting several segments end to end: the line becomes longer along the same direction.
Advanced example (KV Cache inference acceleration): Scenario: During autoregressive LLM generation, newly produced Keys/Values are appended to the existing cache.

# Cache: [Batch, Heads, Past_Len, Head_Dim]
past_key = torch.randn(1, 12, 50, 64)

# New Token Key: [Batch, Heads, 1, Head_Dim]
new_key = torch.randn(1, 12, 1, 64)

# Concatenate along the Sequence dimension (dim=2) -> [1, 12, 51, 64]
current_key = torch.cat([past_key, new_key], dim = 2)

2.2.4.2 `torch.stack`

Academic definition: Create a new dimension and combine tensors along that new axis.
Intuitive explanation: Instead of extending an old direction, place several tensors into a new layer.
Advanced example (RL Experience Replay): Scenario: In reinforcement learning, states sampled from several steps in a replay buffer can be organized into one batch.

2.2.5 Summary Comparison Table

Function	Operation Essence	Typical Academic Scenario (LLM/RL)
`view` / `reshape`	Shape rearrangement (element count unchanged)	Split `Hidden_Dim` into `Num_Heads * Head_Dim`
`flatten`	Dimension merging (many dimensions to one)	Prepare CNN features before an MLP decision layer
`permute`	Global reordering (move axes arbitrarily)	Convert `(B, L, H, D)` to `(B, H, L, D)` for parallel attention
`transpose`	Two-axis swap (exchange one pair only)	Prepare matrix transposition for attention score computation
`unsqueeze`	Axis insertion (add a size-1 dimension)	Build masks that can broadcast to multi-head attention matrices
`cat`	Join along an old axis (extend an existing dimension)	KV Cache updates during inference; multimodal feature fusion
`stack`	Join along a new axis (create an extra dimension)	Organize multiple time-step states into a batch

Reuse

CC BY-NC 4.0

2.2.1 Reshaping

2.2.1.1 torch.view / torch.reshape

2.2.1.2 torch.flatten