📐 Concept diagram

09-08 — Tensor Algebra Introduction

Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-08 Prerequisites: 09-07 — Eigendecomposition Algorithms Next subject: 09-09 — Numerical Linear Algebra

Learning Objectives

Define tensors as multilinear maps and distinguish between covariant, contravariant, and mixed tensors
Compute tensor products of vectors and vector spaces, and represent tensors using multi-dimensional arrays
Understand rank-1 tensors and the CP (CANDECOMP/PARAFAC) decomposition as a generalization of matrix SVD
Describe the Tucker decomposition and its relationship to higher-order SVD (HOSVD)
Apply Einstein summation convention to compactly express tensor operations

Core Content

1. Tensors as Multilinear Maps

A tensor is a multilinear map that takes multiple vectors and/or covectors as input and returns a scalar.

A tensor of type (p, q) takes p covectors (row vectors) and q vectors as arguments:

$T: V* × ... × V* × V × ... × V → ℝ
    └── p copies ──┘   └── q copies ──┘
$

Contravariant components (upper indices): transform like vectors (with the basis)
Covariant components (lower indices): transform like covectors (against the basis)

In coordinates, once a basis {e₁, ..., e_n} of V is chosen, a (p, q)-tensor T is represented by an n^{p+q}-dimensional array:

T = Σ T^{i₁...i_p}_{j₁...j_q}  e_{i₁} ⊗ ... ⊗ e_{i_p} ⊗ e^{j₁} ⊗ ... ⊗ e^{j_q}

For our purposes (data/ML applications), we typically work in Euclidean space with an orthonormal basis where the distinction between covariant and contravariant collapses, and we can think of tensors simply as multi-dimensional arrays.

Order-0 tensor: scalar (1 number) Order-1 tensor: vector (n numbers, 1 index) Order-2 tensor: matrix (n² numbers, 2 indices) Order-3 tensor: "cube" (n³ numbers, 3 indices) Order-d tensor: d-dimensional array (nᵈ numbers)

2. Tensor Product (Kronecker Product for Matrices)

The tensor product of two vectors u ∈ ℝ^m and v ∈ ℝ^n is:

CRITICAL -- Foundational: Tensors generalize vectors/matrices to higher orders -- multilinear maps. Tensor product builds higher-order objects. Language of relativity, mechanics, and deep learning.

$(u ⊗ v)_{ij} = u_i v_j
$

This is an $m × n$ matrix of rank 1. Every rank-1 matrix can be written as u ⊗ v.

The tensor product of two vector spaces V ⊗ W has dimension $dim(V) × dim(W)$. If {e_i} is a basis for V and {f_j} for W, then {e_i ⊗ f_j} is a basis for V ⊗ W.

Tensor product of tensors: If A is an order-p tensor and B is an order-q tensor, A ⊗ B is an order-(p+q) tensor:

(A ⊗ B)_{i₁...i_p, j₁...j_q} = A_{i₁...i_p} · B_{j₁...j_q}

Kronecker product (for matrices): For A ∈ ℝ^{m×n} and B ∈ ℝ^{p×q}:

Common Pitfall: Kronecker product is NOT commutative: A ox B != B ox A. (A ox B)(C ox D) = (AC) ox (BD) only when dimensions match. Do not confuse with tensor product.

$A ⊗ B = [a_{11}B  a_{12}B  ...  a_{1n}B]
        [a_{21}B  a_{22}B  ...  a_{2n}B]
        [...                             ]
        [a_{m1}B  a_{m2}B  ...  a_{mn}B]
$

This is an $mp × nq$ matrix.

Key property: $(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD)$ (when dimensions match).

Vec operator: vec(A) stacks the columns of A into a single vector. The Kronecker product connects to the vec operator via:

$vec(A X B) = (B^T ⊗ A) vec(X)
$

3. Rank-1 Tensors and CP Decomposition

An order-d tensor 𝒳 is rank-1 if it can be written as the tensor product of d vectors:

𝒳 = a^{(1)} ⊗ a^{(2)} ⊗ ... ⊗ a^{(d)}

In component form: 𝒳_{i₁,i₂,...,i_d} = a^{(1)}_{i₁} · a^{(2)}_{i₂} · ... · a^{(d)}_{i_d}

The rank of a tensor is the minimum number of rank-1 tensors needed to sum to it:

𝒳 = Σ_{r=1}^{R} a_r^{(1)} ⊗ a_r^{(2)} ⊗ ... ⊗ a_r^{(d)}

This is the CP decomposition (CANDECOMP/PARAFAC), which generalizes the idea of matrix rank:

𝒳 = Σ_{r=1}^{R} λ_r · a_r^{(1)} ⊗ a_r^{(2)} ⊗ ... ⊗ a_r^{(d)}

where each a_r^{(k)} is a unit vector and $λ_r$ are weights.

For matrices (d=2): CP decomposition is essentially the SVD (sum of rank-1 matrices σ_r u_r ⊗ v_r).

Key differences from matrices: - The CP rank of a tensor may not be easily determined (NP-hard in general) - The best rank-R approximation may not exist (the set of rank-R tensors is not closed for R > 1)

Applications: Component analysis of multi-way data (e.g., EEG data: channels × time × trials; recommendation: users × items × context).

4. Tucker Decomposition and HOSVD

The Tucker decomposition factorizes a tensor into a core tensor multiplied by factor matrices along each mode:

𝒳 ≈ 𝒢 ×₁ A^{(1)} ×₂ A^{(2)} ×₃ ... ×_d A^{(d)}

where: - 𝒢 is the core tensor (r₁ × r₂ × ... × r_d) - A^{(k)} are factor matrices (n_k × r_k) - $×_k$ is the k-mode product

The k-mode product $𝒴 = 𝒳 ×_k M$ multiplies every mode-k fiber of 𝒳 by M:

𝒴_{i₁,...,j,...,i_d} = Σ_{i_k} 𝒳_{i₁,...,i_k,...,i_d} · M_{j, i_k}

Higher-Order SVD (HOSVD): For each mode k, compute the SVD of the mode-k unfolding (matricization) X_{(k)} and take the left singular vectors as A^{(k)}. Then compute the core tensor as $𝒢 = 𝒳 ×₁ (A^{(1)})^T ×₂ ... ×_d (A^{(d)})^T$.

HOSVD provides a quasi-optimal rank-(r₁, ..., r_d) approximation (not optimal like Eckart-Young for matrices, but close).

Comparison: - CP: 𝒳 = Σ λ_r a_r^{(1)} ⊗ ... ⊗ a_r^{(d)} — diagonal core (generalized SVD) - Tucker: 𝒳 = 𝒢 ×₁ A^{(1)} ... ×_d A^{(d)} — full core (generalized PCA) - CP is a special case of Tucker where the core is diagonal (r = r₁ = ... = r_d)

5. Einstein Summation Convention

When an index appears twice in a product (once as superscript, once as subscript), sum over it:

Vector dot product: u_i v^i means $Σ_i u_i v_i$

Matrix-vector product: $y^i = A^i_j x^j$ means $y_i = Σ_j A_{ij} x_j$

Matrix multiplication: $C^i_j = A^i_k B^k_j$ means C_{ij} = Σ_k A_{ik} B_{kj}

Tensor contraction: 𝒯^{ijk}_{kl} 𝒮^l_{m} — sum over k and l (indices appearing twice)

In ML/data contexts with all indices as subscripts: C_{ij} = A_{ik} B_{kj} efficiently describes matrix multiplication and its tensor generalizations without writing sum signs.

Tensor contraction of order-p and order-q tensors along matching modes yields an order-(p+q-2) tensor. This is the generalization of inner product and matrix multiplication to higher orders.

Example — mode-1 product in Einstein notation: 𝒴_{j, i₂, ..., i_d} = 𝒳_{i₁, i₂, ..., i_d} · M_{j, i₁} (sum over i₁).

Key Terms

CP decomposition
Contravariant components
Covariant components
Kronecker product
Tensor contraction
Tucker decomposition

Worked Examples

Example 1: Tensor Product of Vectors

Let $u = [1, 2]^T$, $v = [3, 4, 5]^T$. Their tensor product:

$u ⊗ v = [1·3  1·4  1·5]   [3   4   5]
        [2·3  2·4  2·5] = [6   8  10]
$

This is a rank-1 matrix. All rows are multiples of v^T, all columns are multiples of u.

Example 2: Kronecker Product

$A = [1  2]    B = [0  5]
    [3  4]        [6  7]
$

$A ⊗ B = [1·B  2·B]   [0   5   0  10]
        [3·B  4·B] = [6   7  12  14]
                      [0  15   0  20]
                      [18 21  24  28]
$

Verify property: $(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD)$ where $C = I₂, D = I₂$: $(A ⊗ B)(I₂ ⊗ I₂) = A ⊗ B$. Also $(A I₂) ⊗ (B I₂) = A ⊗ B$. ✓

Example 3: CP Decomposition of a 2×2×2 Tensor

Consider the order-3 tensor 𝒳 with entries:

$𝒳_{:,:,1} = [1  2]    𝒳_{:,:,2} = [3  4]
            [2  4]                [6  8]
$

This is rank-1! Write it as a ⊗ b ⊗ c:

$a = [1, 2]^T     (mode-1 factors)
b = [1, 2]^T     (mode-2 factors)
c = [1, 2]^T     (mode-3 factors)
$

Verify: $𝒳_{ijk} = a_i · b_j · c_k$ - 𝒳_{111} = 1·1·1 = 1 ✓ - 𝒳_{211} = 2·1·1 = 2 ✓ - 𝒳_{112} = 1·2·1 = 2 ✓ - 𝒳_{122} = 1·2·2 = 4 ✓ - 𝒳_{212} = 2·1·2 = 4 ✓ - 𝒳_{222} = 2·2·2 = 8 ✓

All entries match.

Example 4: Einstein Summation

Matrix multiplication C = A B in Einstein notation: C_{ij} = A_{ik} B_{kj} (sum over k)

For $A = [[1,2],[3,4]]$, $B = [[5,6],[7,8]]$:

$C_{11} = A_{1k}B_{k1} = 1·5 + 2·7 = 19
C_{12} = A_{1k}B_{k2} = 1·6 + 2·8 = 22
C_{21} = A_{2k}B_{k1} = 3·5 + 4·7 = 43
C_{22} = A_{2k}B_{k2} = 3·6 + 4·8 = 50
$

Trace of A: tr(A) = A_{ii} (sum over i).

Quiz

Q1: What does the concept of Contravariant components primarily refer to in this subject?

A) The definition and application of Contravariant components B) A computational error related to Contravariant components C) A visual representation of Contravariant components D) A historical anecdote about Contravariant components

Correct: A)

If you chose A: Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus. Correct!
If you chose B: This is incorrect. Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus.
If you chose C: This is incorrect. Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus.
If you chose D: This is incorrect. Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus.

Q2: What is the primary purpose of Covariant components?

A) It is used to covariant components in mathematical analysis B) It replaces all other methods in this domain C) It is used only in advanced research contexts D) It is primarily a historical notation system

Correct: A)

If you chose A: Covariant components serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose B: This is incorrect. Covariant components serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose C: This is incorrect. Covariant components serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose D: This is incorrect. Covariant components serves the purpose described in the correct answer. The other options misrepresent its role.

Q3: Which statement about Kronecker product is TRUE?

A) Kronecker product is not related to this subject B) Kronecker product is an advanced topic beyond this subject's scope C) Kronecker product is mentioned only as a historical footnote D) Kronecker product is a fundamental concept covered in this subject

Correct: D)

If you chose A: This is incorrect. Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content.
If you chose B: This is incorrect. Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content.
If you chose C: This is incorrect. Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content.
If you chose D: Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content. Correct!

Q4: Based on the worked examples in this subject, what is the correct result?

A) [[2, 0], [2, 0]] B) A different result from a common mistake C) An unrelated numerical value D) The inverse of the correct answer

Correct: A)

If you chose A: The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors. Correct!
If you chose B: This is incorrect. The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors.
If you chose C: This is incorrect. The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors.
If you chose D: This is incorrect. The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors.

Q5: How are Kronecker product and CP decomposition related?

A) Kronecker product is a special case of CP decomposition B) Kronecker product and CP decomposition are completely unrelated topics C) Kronecker product is the inverse of CP decomposition D) Kronecker product and CP decomposition are closely related concepts

Correct: D)

If you chose A: This is incorrect. Both Kronecker product and CP decomposition are covered in this subject as interconnected topics.
If you chose B: This is incorrect. Both Kronecker product and CP decomposition are covered in this subject as interconnected topics.
If you chose C: This is incorrect. Both Kronecker product and CP decomposition are covered in this subject as interconnected topics.
If you chose D: Both Kronecker product and CP decomposition are covered in this subject as interconnected topics. Correct!

Q6: What is a common pitfall when working with Tucker decomposition?

A) The main error with Tucker decomposition is using it when it is not needed B) Tucker decomposition has no common misconceptions C) Tucker decomposition is always computed the same way in all contexts D) A common mistake is confusing Tucker decomposition with a similar concept

Correct: D)

If you chose A: This is incorrect. Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose B: This is incorrect. Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose C: This is incorrect. Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose D: Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!

Q7: When should you apply Tensor contraction?

A) Tensor contraction is not practically useful B) Apply Tensor contraction to solve problems in this subject's domain C) Avoid Tensor contraction unless explicitly instructed D) Use Tensor contraction only in pure mathematics contexts

Correct: B)

If you chose A: This is incorrect. Tensor contraction is a practical tool used throughout this subject to solve relevant problems.
If you chose B: Tensor contraction is a practical tool used throughout this subject to solve relevant problems. Correct!
If you chose C: This is incorrect. Tensor contraction is a practical tool used throughout this subject to solve relevant problems.
If you chose D: This is incorrect. Tensor contraction is a practical tool used throughout this subject to solve relevant problems.

Practice Problems

Compute the tensor product u ⊗ v ⊗ w where $u = [1, 1]^T$, $v = [2, 0]^T$, $w = [1, -1]^T$. Give the 2×2×2 tensor entries.
Verify that the Kronecker product is NOT commutative by computing A ⊗ B and B ⊗ A for $A = [1, 0]$ and $B = [0, 1]$ (as 1×2 row vectors viewed as 1×2 matrices).
A 2×2×2 tensor 𝒳 has CP decomposition a ⊗ b ⊗ c where $a = [1, 2]^T$, $b = [3, 4]^T$, $c = [0, 5]^T$. What is 𝒳_{212}?
Write the expression for the mode-2 product $𝒴 = 𝒳 ×₂ M$ using the Einstein summation convention, where 𝒳 is order-3 and M is a matrix.
For the 2×2×2 tensor with entries $𝒳_{111}=1, 𝒳_{211}=2, 𝒳_{121}=3, 𝒳_{221}=4, 𝒳_{112}=5, 𝒳_{212}=6, 𝒳_{122}=7, 𝒳_{222}=8$, compute the mode-1 unfolding X_{(1)} as a 2×4 matrix.

Answers

1. 𝒳_{ijk} = u_i · v_j · w_k. Front slice (k=1, w₁=1): [[1·2·1, 1·0·1], [1·2·1, 1·0·1]] = [[2, 0], [2, 0]] Back slice (k=2, w₂=-1): [[1·2·(-1), 1·0·(-1)], [1·2·(-1), 1·0·(-1)]] = [[-2, 0], [-2, 0]] So 𝒳 = [[[2,0],[2,0]], [[-2,0],[-2,0]]]. 2. A = [[1,0]] (1×2), B = [[0,1]] (1×2). A⊗B = [[0,1,0,0]] (1×4). B⊗A = [[0,0,1,0]] (1×4). Different! A⊗B ≠ B⊗A. 3. 𝒳_{212} = a₂ · b₁ · c₂ = 2 · 3 · 5 = 30. 4. 𝒴_{i, j, k} = 𝒳_{i, p, k} · M_{j, p} (sum over p; p is the mode-2 index). Here the new index j comes from M (output dimension), and i, k are the unchanged mode-1 and mode-3 indices. 5. Mode-1 unfolding: mode-1 varies fastest, then mode-2, then mode-3. Columns correspond to (mode-2, mode-3) pairs: (1,1), (2,1), (1,2), (2,2). X_{(1)} = [[1, 3, 5, 7], [2, 4, 6, 8]]

Summary

Tensors generalize vectors (order 1) and matrices (order 2) to arbitrary order d: multilinear maps represented as d-dimensional arrays
The tensor product ⊗ builds higher-order tensors from lower-order ones: (A ⊗ B)_{i₁...i_p, j₁...j_q} = A_{i₁...i_p} B_{j₁...j_q}
CP decomposition factorizes a tensor as a sum of rank-1 tensors, generalizing the SVD of matrices
Tucker decomposition uses a core tensor multiplied by factor matrices along each mode; HOSVD provides a computable approximation
Einstein summation convention (C_{ij} = A_{ik} B_{kj}) compactly expresses tensor contractions, critical for deep learning tensor operations

Pitfalls

Confusing tensor order with tensor rank. Order is the number of modes/indices (a structural property). Rank is the minimum number of rank-1 terms needed to sum to the tensor (a decomposition property). A 3×4×5 tensor has order 3; its CP rank could be anything.
Assuming tensor rank is bounded by mode dimension. Unlike matrices where rank ≤ min(m, n), tensor rank can exceed every individual mode dimension. Determining CP rank is NP-hard in general.
Treating the Kronecker product as commutative. A ⊗ B ≠ B ⊗ A in general. The mixed-product property $(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD)$ only holds when the inner dimensions match.
Assuming the CP decomposition always exists for a given rank. The set of rank-R tensors is not closed for R > 1 (unlike matrices). The best rank-R approximation may not exist — iterative algorithms can diverge ("degenerate" CP).
Misapplying Einstein summation across incompatible index conventions. In standard physics convention, summation is over one upper and one lower index. In ML/data contexts where all indices are subscripts, repeated subscripts imply summation — be consistent within each derivation.

Next Steps

Continue to 09-09 Numerical Linear Algebra to learn about floating-point errors, stability, sparse matrices, and iterative methods.

Progress

Phases

09-08 — Tensor Algebra Introduction

Learning Objectives

Core Content

1. Tensors as Multilinear Maps

2. Tensor Product (Kronecker Product for Matrices)

3. Rank-1 Tensors and CP Decomposition

4. Tucker Decomposition and HOSVD

5. Einstein Summation Convention

Key Terms

Worked Examples

Quiz

Practice Problems

Summary

Pitfalls

Next Steps