Math graphic
๐Ÿ“ Concept diagram

09-08 โ€” Tensor Algebra Introduction

Phase: 9 โ€” Matrix Decompositions & Advanced Linear Algebra Subject: 09-08 Prerequisites: 09-07 โ€” Eigendecomposition Algorithms Next subject: 09-09 โ€” Numerical Linear Algebra


Learning Objectives

  1. Define tensors as multilinear maps and distinguish between covariant, contravariant, and mixed tensors
  2. Compute tensor products of vectors and vector spaces, and represent tensors using multi-dimensional arrays
  3. Understand rank-1 tensors and the CP (CANDECOMP/PARAFAC) decomposition as a generalization of matrix SVD
  4. Describe the Tucker decomposition and its relationship to higher-order SVD (HOSVD)
  5. Apply Einstein summation convention to compactly express tensor operations

Core Content

1. Tensors as Multilinear Maps

A tensor is a multilinear map that takes multiple vectors and/or covectors as input and returns a scalar.

A tensor of type (p, q) takes p covectors (row vectors) and q vectors as arguments:

$T: V* ร— ... ร— V* ร— V ร— ... ร— V โ†’ โ„
    โ””โ”€โ”€ p copies โ”€โ”€โ”˜   โ””โ”€โ”€ q copies โ”€โ”€โ”˜
$

In coordinates, once a basis {eโ‚, ..., e_n} of V is chosen, a (p, q)-tensor T is represented by an n^{p+q}-dimensional array:

T = ฮฃ T^{iโ‚...i_p}_{jโ‚...j_q}  e_{iโ‚} โŠ— ... โŠ— e_{i_p} โŠ— e^{jโ‚} โŠ— ... โŠ— e^{j_q}

For our purposes (data/ML applications), we typically work in Euclidean space with an orthonormal basis where the distinction between covariant and contravariant collapses, and we can think of tensors simply as multi-dimensional arrays.

Order-0 tensor: scalar (1 number) Order-1 tensor: vector (n numbers, 1 index) Order-2 tensor: matrix (nยฒ numbers, 2 indices) Order-3 tensor: "cube" (nยณ numbers, 3 indices) Order-d tensor: d-dimensional array (nแตˆ numbers)

2. Tensor Product (Kronecker Product for Matrices)

The tensor product of two vectors u โˆˆ โ„^m and v โˆˆ โ„^n is:

CRITICAL -- Foundational: Tensors generalize vectors/matrices to higher orders -- multilinear maps. Tensor product builds higher-order objects. Language of relativity, mechanics, and deep learning.

$(u โŠ— v)_{ij} = u_i v_j
$

This is an $m ร— n$ matrix of rank 1. Every rank-1 matrix can be written as u โŠ— v.

The tensor product of two vector spaces V โŠ— W has dimension $dim(V) ร— dim(W)$. If {e_i} is a basis for V and {f_j} for W, then {e_i โŠ— f_j} is a basis for V โŠ— W.

Tensor product of tensors: If A is an order-p tensor and B is an order-q tensor, A โŠ— B is an order-(p+q) tensor:

(A โŠ— B)_{iโ‚...i_p, jโ‚...j_q} = A_{iโ‚...i_p} ยท B_{jโ‚...j_q}

Kronecker product (for matrices): For A โˆˆ โ„^{mร—n} and B โˆˆ โ„^{pร—q}:

Common Pitfall: Kronecker product is NOT commutative: A ox B != B ox A. (A ox B)(C ox D) = (AC) ox (BD) only when dimensions match. Do not confuse with tensor product.

$A โŠ— B = [a_{11}B  a_{12}B  ...  a_{1n}B]
        [a_{21}B  a_{22}B  ...  a_{2n}B]
        [...                             ]
        [a_{m1}B  a_{m2}B  ...  a_{mn}B]
$

This is an $mp ร— nq$ matrix.

Key property: $(A โŠ— B)(C โŠ— D) = (AC) โŠ— (BD)$ (when dimensions match).

Vec operator: vec(A) stacks the columns of A into a single vector. The Kronecker product connects to the vec operator via:

$vec(A X B) = (B^T โŠ— A) vec(X)
$

3. Rank-1 Tensors and CP Decomposition

An order-d tensor ๐’ณ is rank-1 if it can be written as the tensor product of d vectors:

๐’ณ = a^{(1)} โŠ— a^{(2)} โŠ— ... โŠ— a^{(d)}

In component form: ๐’ณ_{iโ‚,iโ‚‚,...,i_d} = a^{(1)}_{iโ‚} ยท a^{(2)}_{iโ‚‚} ยท ... ยท a^{(d)}_{i_d}

The rank of a tensor is the minimum number of rank-1 tensors needed to sum to it:

๐’ณ = ฮฃ_{r=1}^{R} a_r^{(1)} โŠ— a_r^{(2)} โŠ— ... โŠ— a_r^{(d)}

This is the CP decomposition (CANDECOMP/PARAFAC), which generalizes the idea of matrix rank:

๐’ณ = ฮฃ_{r=1}^{R} ฮป_r ยท a_r^{(1)} โŠ— a_r^{(2)} โŠ— ... โŠ— a_r^{(d)}

where each a_r^{(k)} is a unit vector and $ฮป_r$ are weights.

For matrices (d=2): CP decomposition is essentially the SVD (sum of rank-1 matrices ฯƒ_r u_r โŠ— v_r).

Key differences from matrices: - The CP rank of a tensor may not be easily determined (NP-hard in general) - The best rank-R approximation may not exist (the set of rank-R tensors is not closed for R > 1)

Applications: Component analysis of multi-way data (e.g., EEG data: channels ร— time ร— trials; recommendation: users ร— items ร— context).

4. Tucker Decomposition and HOSVD

The Tucker decomposition factorizes a tensor into a core tensor multiplied by factor matrices along each mode:

๐’ณ โ‰ˆ ๐’ข ร—โ‚ A^{(1)} ร—โ‚‚ A^{(2)} ร—โ‚ƒ ... ร—_d A^{(d)}

where: - ๐’ข is the core tensor (rโ‚ ร— rโ‚‚ ร— ... ร— r_d) - A^{(k)} are factor matrices (n_k ร— r_k) - $ร—_k$ is the k-mode product

The k-mode product $๐’ด = ๐’ณ ร—_k M$ multiplies every mode-k fiber of ๐’ณ by M:

๐’ด_{iโ‚,...,j,...,i_d} = ฮฃ_{i_k} ๐’ณ_{iโ‚,...,i_k,...,i_d} ยท M_{j, i_k}

Higher-Order SVD (HOSVD): For each mode k, compute the SVD of the mode-k unfolding (matricization) X_{(k)} and take the left singular vectors as A^{(k)}. Then compute the core tensor as $๐’ข = ๐’ณ ร—โ‚ (A^{(1)})^T ร—โ‚‚ ... ร—_d (A^{(d)})^T$.

HOSVD provides a quasi-optimal rank-(rโ‚, ..., r_d) approximation (not optimal like Eckart-Young for matrices, but close).

Comparison: - CP: ๐’ณ = ฮฃ ฮป_r a_r^{(1)} โŠ— ... โŠ— a_r^{(d)} โ€” diagonal core (generalized SVD) - Tucker: ๐’ณ = ๐’ข ร—โ‚ A^{(1)} ... ร—_d A^{(d)} โ€” full core (generalized PCA) - CP is a special case of Tucker where the core is diagonal (r = rโ‚ = ... = r_d)

5. Einstein Summation Convention

When an index appears twice in a product (once as superscript, once as subscript), sum over it:

Vector dot product: u_i v^i means $ฮฃ_i u_i v_i$

Matrix-vector product: $y^i = A^i_j x^j$ means $y_i = ฮฃ_j A_{ij} x_j$

Matrix multiplication: $C^i_j = A^i_k B^k_j$ means C_{ij} = ฮฃ_k A_{ik} B_{kj}

Tensor contraction: ๐’ฏ^{ijk}_{kl} ๐’ฎ^l_{m} โ€” sum over k and l (indices appearing twice)

In ML/data contexts with all indices as subscripts: C_{ij} = A_{ik} B_{kj} efficiently describes matrix multiplication and its tensor generalizations without writing sum signs.

Tensor contraction of order-p and order-q tensors along matching modes yields an order-(p+q-2) tensor. This is the generalization of inner product and matrix multiplication to higher orders.

Example โ€” mode-1 product in Einstein notation: ๐’ด_{j, iโ‚‚, ..., i_d} = ๐’ณ_{iโ‚, iโ‚‚, ..., i_d} ยท M_{j, iโ‚} (sum over iโ‚).



Key Terms

Worked Examples

Example 1: Tensor Product of Vectors

Let $u = [1, 2]^T$, $v = [3, 4, 5]^T$. Their tensor product:

$u โŠ— v = [1ยท3  1ยท4  1ยท5]   [3   4   5]
        [2ยท3  2ยท4  2ยท5] = [6   8  10]
$

This is a rank-1 matrix. All rows are multiples of v^T, all columns are multiples of u.

Example 2: Kronecker Product

$A = [1  2]    B = [0  5]
    [3  4]        [6  7]
$
$A โŠ— B = [1ยทB  2ยทB]   [0   5   0  10]
        [3ยทB  4ยทB] = [6   7  12  14]
                      [0  15   0  20]
                      [18 21  24  28]
$

Verify property: $(A โŠ— B)(C โŠ— D) = (AC) โŠ— (BD)$ where $C = Iโ‚‚, D = Iโ‚‚$: $(A โŠ— B)(Iโ‚‚ โŠ— Iโ‚‚) = A โŠ— B$. Also $(A Iโ‚‚) โŠ— (B Iโ‚‚) = A โŠ— B$. โœ“

Example 3: CP Decomposition of a 2ร—2ร—2 Tensor

Consider the order-3 tensor ๐’ณ with entries:

$๐’ณ_{:,:,1} = [1  2]    ๐’ณ_{:,:,2} = [3  4]
            [2  4]                [6  8]
$

This is rank-1! Write it as a โŠ— b โŠ— c:

$a = [1, 2]^T     (mode-1 factors)
b = [1, 2]^T     (mode-2 factors)
c = [1, 2]^T     (mode-3 factors)
$

Verify: $๐’ณ_{ijk} = a_i ยท b_j ยท c_k$ - ๐’ณ_{111} = 1ยท1ยท1 = 1 โœ“ - ๐’ณ_{211} = 2ยท1ยท1 = 2 โœ“ - ๐’ณ_{112} = 1ยท2ยท1 = 2 โœ“ - ๐’ณ_{122} = 1ยท2ยท2 = 4 โœ“ - ๐’ณ_{212} = 2ยท1ยท2 = 4 โœ“ - ๐’ณ_{222} = 2ยท2ยท2 = 8 โœ“

All entries match.

Example 4: Einstein Summation

Matrix multiplication C = A B in Einstein notation: C_{ij} = A_{ik} B_{kj} (sum over k)

For $A = [[1,2],[3,4]]$, $B = [[5,6],[7,8]]$:

$C_{11} = A_{1k}B_{k1} = 1ยท5 + 2ยท7 = 19
C_{12} = A_{1k}B_{k2} = 1ยท6 + 2ยท8 = 22
C_{21} = A_{2k}B_{k1} = 3ยท5 + 4ยท7 = 43
C_{22} = A_{2k}B_{k2} = 3ยท6 + 4ยท8 = 50
$

Trace of A: tr(A) = A_{ii} (sum over i).


Quiz

Q1: What does the concept of Contravariant components primarily refer to in this subject?

A) The definition and application of Contravariant components B) A computational error related to Contravariant components C) A visual representation of Contravariant components D) A historical anecdote about Contravariant components

Correct: A)

Q2: What is the primary purpose of Covariant components?

A) It is used to covariant components in mathematical analysis B) It replaces all other methods in this domain C) It is used only in advanced research contexts D) It is primarily a historical notation system

Correct: A)

Q3: Which statement about Kronecker product is TRUE?

A) Kronecker product is not related to this subject B) Kronecker product is an advanced topic beyond this subject's scope C) Kronecker product is mentioned only as a historical footnote D) Kronecker product is a fundamental concept covered in this subject

Correct: D)

Q4: Based on the worked examples in this subject, what is the correct result?

A) [[2, 0], [2, 0]] B) A different result from a common mistake C) An unrelated numerical value D) The inverse of the correct answer

Correct: A)

Q5: How are Kronecker product and CP decomposition related?

A) Kronecker product is a special case of CP decomposition B) Kronecker product and CP decomposition are completely unrelated topics C) Kronecker product is the inverse of CP decomposition D) Kronecker product and CP decomposition are closely related concepts

Correct: D)

Q6: What is a common pitfall when working with Tucker decomposition?

A) The main error with Tucker decomposition is using it when it is not needed B) Tucker decomposition has no common misconceptions C) Tucker decomposition is always computed the same way in all contexts D) A common mistake is confusing Tucker decomposition with a similar concept

Correct: D)

Q7: When should you apply Tensor contraction?

A) Tensor contraction is not practically useful B) Apply Tensor contraction to solve problems in this subject's domain C) Avoid Tensor contraction unless explicitly instructed D) Use Tensor contraction only in pure mathematics contexts

Correct: B)

Practice Problems

  1. Compute the tensor product u โŠ— v โŠ— w where $u = [1, 1]^T$, $v = [2, 0]^T$, $w = [1, -1]^T$. Give the 2ร—2ร—2 tensor entries.

  2. Verify that the Kronecker product is NOT commutative by computing A โŠ— B and B โŠ— A for $A = [1, 0]$ and $B = [0, 1]$ (as 1ร—2 row vectors viewed as 1ร—2 matrices).

  3. A 2ร—2ร—2 tensor ๐’ณ has CP decomposition a โŠ— b โŠ— c where $a = [1, 2]^T$, $b = [3, 4]^T$, $c = [0, 5]^T$. What is ๐’ณ_{212}?

  4. Write the expression for the mode-2 product $๐’ด = ๐’ณ ร—โ‚‚ M$ using the Einstein summation convention, where ๐’ณ is order-3 and M is a matrix.

  5. For the 2ร—2ร—2 tensor with entries $๐’ณ_{111}=1, ๐’ณ_{211}=2, ๐’ณ_{121}=3, ๐’ณ_{221}=4, ๐’ณ_{112}=5, ๐’ณ_{212}=6, ๐’ณ_{122}=7, ๐’ณ_{222}=8$, compute the mode-1 unfolding X_{(1)} as a 2ร—4 matrix.

Answers 1. ๐’ณ_{ijk} = u_i ยท v_j ยท w_k. Front slice (k=1, wโ‚=1): [[1ยท2ยท1, 1ยท0ยท1], [1ยท2ยท1, 1ยท0ยท1]] = [[2, 0], [2, 0]] Back slice (k=2, wโ‚‚=-1): [[1ยท2ยท(-1), 1ยท0ยท(-1)], [1ยท2ยท(-1), 1ยท0ยท(-1)]] = [[-2, 0], [-2, 0]] So ๐’ณ = [[[2,0],[2,0]], [[-2,0],[-2,0]]]. 2. A = [[1,0]] (1ร—2), B = [[0,1]] (1ร—2). AโŠ—B = [[0,1,0,0]] (1ร—4). BโŠ—A = [[0,0,1,0]] (1ร—4). Different! AโŠ—B โ‰  BโŠ—A. 3. ๐’ณ_{212} = aโ‚‚ ยท bโ‚ ยท cโ‚‚ = 2 ยท 3 ยท 5 = 30. 4. ๐’ด_{i, j, k} = ๐’ณ_{i, p, k} ยท M_{j, p} (sum over p; p is the mode-2 index). Here the new index j comes from M (output dimension), and i, k are the unchanged mode-1 and mode-3 indices. 5. Mode-1 unfolding: mode-1 varies fastest, then mode-2, then mode-3. Columns correspond to (mode-2, mode-3) pairs: (1,1), (2,1), (1,2), (2,2). X_{(1)} = [[1, 3, 5, 7], [2, 4, 6, 8]]

Summary


Pitfalls



Next Steps

Continue to 09-09 Numerical Linear Algebra to learn about floating-point errors, stability, sparse matrices, and iterative methods.