09-08 โ Tensor Algebra Introduction
Phase: 9 โ Matrix Decompositions & Advanced Linear Algebra Subject: 09-08 Prerequisites: 09-07 โ Eigendecomposition Algorithms Next subject: 09-09 โ Numerical Linear Algebra
Learning Objectives
- Define tensors as multilinear maps and distinguish between covariant, contravariant, and mixed tensors
- Compute tensor products of vectors and vector spaces, and represent tensors using multi-dimensional arrays
- Understand rank-1 tensors and the CP (CANDECOMP/PARAFAC) decomposition as a generalization of matrix SVD
- Describe the Tucker decomposition and its relationship to higher-order SVD (HOSVD)
- Apply Einstein summation convention to compactly express tensor operations
Core Content
1. Tensors as Multilinear Maps
A tensor is a multilinear map that takes multiple vectors and/or covectors as input and returns a scalar.
A tensor of type (p, q) takes p covectors (row vectors) and q vectors as arguments:
$T: V* ร ... ร V* ร V ร ... ร V โ โ
โโโ p copies โโโ โโโ q copies โโโ
$
- Contravariant components (upper indices): transform like vectors (with the basis)
- Covariant components (lower indices): transform like covectors (against the basis)
In coordinates, once a basis {eโ, ..., e_n} of V is chosen, a (p, q)-tensor T is represented by an n^{p+q}-dimensional array:
T = ฮฃ T^{iโ...i_p}_{jโ...j_q} e_{iโ} โ ... โ e_{i_p} โ e^{jโ} โ ... โ e^{j_q}
For our purposes (data/ML applications), we typically work in Euclidean space with an orthonormal basis where the distinction between covariant and contravariant collapses, and we can think of tensors simply as multi-dimensional arrays.
Order-0 tensor: scalar (1 number) Order-1 tensor: vector (n numbers, 1 index) Order-2 tensor: matrix (nยฒ numbers, 2 indices) Order-3 tensor: "cube" (nยณ numbers, 3 indices) Order-d tensor: d-dimensional array (nแต numbers)
2. Tensor Product (Kronecker Product for Matrices)
The tensor product of two vectors u โ โ^m and v โ โ^n is:
CRITICAL -- Foundational: Tensors generalize vectors/matrices to higher orders -- multilinear maps. Tensor product builds higher-order objects. Language of relativity, mechanics, and deep learning.
$(u โ v)_{ij} = u_i v_j
$
This is an $m ร n$ matrix of rank 1. Every rank-1 matrix can be written as u โ v.
The tensor product of two vector spaces V โ W has dimension $dim(V) ร dim(W)$. If {e_i} is a basis for V and {f_j} for W, then {e_i โ f_j} is a basis for V โ W.
Tensor product of tensors: If A is an order-p tensor and B is an order-q tensor, A โ B is an order-(p+q) tensor:
(A โ B)_{iโ...i_p, jโ...j_q} = A_{iโ...i_p} ยท B_{jโ...j_q}
Kronecker product (for matrices): For A โ โ^{mรn} and B โ โ^{pรq}:
Common Pitfall: Kronecker product is NOT commutative: A ox B != B ox A. (A ox B)(C ox D) = (AC) ox (BD) only when dimensions match. Do not confuse with tensor product.
$A โ B = [a_{11}B a_{12}B ... a_{1n}B]
[a_{21}B a_{22}B ... a_{2n}B]
[... ]
[a_{m1}B a_{m2}B ... a_{mn}B]
$
This is an $mp ร nq$ matrix.
Key property: $(A โ B)(C โ D) = (AC) โ (BD)$ (when dimensions match).
Vec operator: vec(A) stacks the columns of A into a single vector. The Kronecker product connects to the vec operator via:
$vec(A X B) = (B^T โ A) vec(X) $
3. Rank-1 Tensors and CP Decomposition
An order-d tensor ๐ณ is rank-1 if it can be written as the tensor product of d vectors:
๐ณ = a^{(1)} โ a^{(2)} โ ... โ a^{(d)}
In component form: ๐ณ_{iโ,iโ,...,i_d} = a^{(1)}_{iโ} ยท a^{(2)}_{iโ} ยท ... ยท a^{(d)}_{i_d}
The rank of a tensor is the minimum number of rank-1 tensors needed to sum to it:
๐ณ = ฮฃ_{r=1}^{R} a_r^{(1)} โ a_r^{(2)} โ ... โ a_r^{(d)}
This is the CP decomposition (CANDECOMP/PARAFAC), which generalizes the idea of matrix rank:
๐ณ = ฮฃ_{r=1}^{R} ฮป_r ยท a_r^{(1)} โ a_r^{(2)} โ ... โ a_r^{(d)}
where each a_r^{(k)} is a unit vector and $ฮป_r$ are weights.
For matrices (d=2): CP decomposition is essentially the SVD (sum of rank-1 matrices ฯ_r u_r โ v_r).
Key differences from matrices: - The CP rank of a tensor may not be easily determined (NP-hard in general) - The best rank-R approximation may not exist (the set of rank-R tensors is not closed for R > 1)
Applications: Component analysis of multi-way data (e.g., EEG data: channels ร time ร trials; recommendation: users ร items ร context).
4. Tucker Decomposition and HOSVD
The Tucker decomposition factorizes a tensor into a core tensor multiplied by factor matrices along each mode:
๐ณ โ ๐ข รโ A^{(1)} รโ A^{(2)} รโ ... ร_d A^{(d)}
where:
- ๐ข is the core tensor (rโ ร rโ ร ... ร r_d)
- A^{(k)} are factor matrices (n_k ร r_k)
- $ร_k$ is the k-mode product
The k-mode product $๐ด = ๐ณ ร_k M$ multiplies every mode-k fiber of ๐ณ by M:
๐ด_{iโ,...,j,...,i_d} = ฮฃ_{i_k} ๐ณ_{iโ,...,i_k,...,i_d} ยท M_{j, i_k}
Higher-Order SVD (HOSVD): For each mode k, compute the SVD of the mode-k unfolding (matricization) X_{(k)} and take the left singular vectors as A^{(k)}. Then compute the core tensor as $๐ข = ๐ณ รโ (A^{(1)})^T รโ ... ร_d (A^{(d)})^T$.
HOSVD provides a quasi-optimal rank-(rโ, ..., r_d) approximation (not optimal like Eckart-Young for matrices, but close).
Comparison:
- CP: ๐ณ = ฮฃ ฮป_r a_r^{(1)} โ ... โ a_r^{(d)} โ diagonal core (generalized SVD)
- Tucker: ๐ณ = ๐ข รโ A^{(1)} ... ร_d A^{(d)} โ full core (generalized PCA)
- CP is a special case of Tucker where the core is diagonal (r = rโ = ... = r_d)
5. Einstein Summation Convention
When an index appears twice in a product (once as superscript, once as subscript), sum over it:
Vector dot product: u_i v^i means $ฮฃ_i u_i v_i$
Matrix-vector product: $y^i = A^i_j x^j$ means $y_i = ฮฃ_j A_{ij} x_j$
Matrix multiplication: $C^i_j = A^i_k B^k_j$ means C_{ij} = ฮฃ_k A_{ik} B_{kj}
Tensor contraction: ๐ฏ^{ijk}_{kl} ๐ฎ^l_{m} โ sum over k and l (indices appearing twice)
In ML/data contexts with all indices as subscripts: C_{ij} = A_{ik} B_{kj} efficiently describes matrix multiplication and its tensor generalizations without writing sum signs.
Tensor contraction of order-p and order-q tensors along matching modes yields an order-(p+q-2) tensor. This is the generalization of inner product and matrix multiplication to higher orders.
Example โ mode-1 product in Einstein notation:
๐ด_{j, iโ, ..., i_d} = ๐ณ_{iโ, iโ, ..., i_d} ยท M_{j, iโ} (sum over iโ).
Key Terms
- CP decomposition
- Contravariant components
- Covariant components
- Kronecker product
- Tensor contraction
- Tucker decomposition
Worked Examples
Example 1: Tensor Product of Vectors
Let $u = [1, 2]^T$, $v = [3, 4, 5]^T$. Their tensor product:
$u โ v = [1ยท3 1ยท4 1ยท5] [3 4 5]
[2ยท3 2ยท4 2ยท5] = [6 8 10]
$
This is a rank-1 matrix. All rows are multiples of v^T, all columns are multiples of u.
Example 2: Kronecker Product
$A = [1 2] B = [0 5]
[3 4] [6 7]
$
$A โ B = [1ยทB 2ยทB] [0 5 0 10]
[3ยทB 4ยทB] = [6 7 12 14]
[0 15 0 20]
[18 21 24 28]
$
Verify property: $(A โ B)(C โ D) = (AC) โ (BD)$ where $C = Iโ, D = Iโ$: $(A โ B)(Iโ โ Iโ) = A โ B$. Also $(A Iโ) โ (B Iโ) = A โ B$. โ
Example 3: CP Decomposition of a 2ร2ร2 Tensor
Consider the order-3 tensor ๐ณ with entries:
$๐ณ_{:,:,1} = [1 2] ๐ณ_{:,:,2} = [3 4]
[2 4] [6 8]
$
This is rank-1! Write it as a โ b โ c:
$a = [1, 2]^T (mode-1 factors) b = [1, 2]^T (mode-2 factors) c = [1, 2]^T (mode-3 factors) $
Verify: $๐ณ_{ijk} = a_i ยท b_j ยท c_k$ - ๐ณ_{111} = 1ยท1ยท1 = 1 โ - ๐ณ_{211} = 2ยท1ยท1 = 2 โ - ๐ณ_{112} = 1ยท2ยท1 = 2 โ - ๐ณ_{122} = 1ยท2ยท2 = 4 โ - ๐ณ_{212} = 2ยท1ยท2 = 4 โ - ๐ณ_{222} = 2ยท2ยท2 = 8 โ
All entries match.
Example 4: Einstein Summation
Matrix multiplication C = A B in Einstein notation:
C_{ij} = A_{ik} B_{kj} (sum over k)
For $A = [[1,2],[3,4]]$, $B = [[5,6],[7,8]]$:
$C_{11} = A_{1k}B_{k1} = 1ยท5 + 2ยท7 = 19
C_{12} = A_{1k}B_{k2} = 1ยท6 + 2ยท8 = 22
C_{21} = A_{2k}B_{k1} = 3ยท5 + 4ยท7 = 43
C_{22} = A_{2k}B_{k2} = 3ยท6 + 4ยท8 = 50
$
Trace of A: tr(A) = A_{ii} (sum over i).
Quiz
Q1: What does the concept of Contravariant components primarily refer to in this subject?
A) The definition and application of Contravariant components B) A computational error related to Contravariant components C) A visual representation of Contravariant components D) A historical anecdote about Contravariant components
Correct: A)
- If you chose A: Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus. Correct!
- If you chose B: This is incorrect. Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus.
- If you chose C: This is incorrect. Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus.
- If you chose D: This is incorrect. Contravariant components is defined as: the definition and application of contravariant components. The other options describe different aspects that are not the primary focus.
Q2: What is the primary purpose of Covariant components?
A) It is used to covariant components in mathematical analysis B) It replaces all other methods in this domain C) It is used only in advanced research contexts D) It is primarily a historical notation system
Correct: A)
- If you chose A: Covariant components serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose B: This is incorrect. Covariant components serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: This is incorrect. Covariant components serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. Covariant components serves the purpose described in the correct answer. The other options misrepresent its role.
Q3: Which statement about Kronecker product is TRUE?
A) Kronecker product is not related to this subject B) Kronecker product is an advanced topic beyond this subject's scope C) Kronecker product is mentioned only as a historical footnote D) Kronecker product is a fundamental concept covered in this subject
Correct: D)
- If you chose A: This is incorrect. Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content.
- If you chose B: This is incorrect. Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content.
- If you chose C: This is incorrect. Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content.
- If you chose D: Kronecker product is a fundamental concept covered in this subject. This subject covers Kronecker product as part of its core content. Correct!
Q4: Based on the worked examples in this subject, what is the correct result?
A) [[2, 0], [2, 0]] B) A different result from a common mistake C) An unrelated numerical value D) The inverse of the correct answer
Correct: A)
- If you chose A: The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors. Correct!
- If you chose B: This is incorrect. The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is [[2, 0], [2, 0]]. The other options represent common errors.
Q5: How are Kronecker product and CP decomposition related?
A) Kronecker product is a special case of CP decomposition B) Kronecker product and CP decomposition are completely unrelated topics C) Kronecker product is the inverse of CP decomposition D) Kronecker product and CP decomposition are closely related concepts
Correct: D)
- If you chose A: This is incorrect. Both Kronecker product and CP decomposition are covered in this subject as interconnected topics.
- If you chose B: This is incorrect. Both Kronecker product and CP decomposition are covered in this subject as interconnected topics.
- If you chose C: This is incorrect. Both Kronecker product and CP decomposition are covered in this subject as interconnected topics.
- If you chose D: Both Kronecker product and CP decomposition are covered in this subject as interconnected topics. Correct!
Q6: What is a common pitfall when working with Tucker decomposition?
A) The main error with Tucker decomposition is using it when it is not needed B) Tucker decomposition has no common misconceptions C) Tucker decomposition is always computed the same way in all contexts D) A common mistake is confusing Tucker decomposition with a similar concept
Correct: D)
- If you chose A: This is incorrect. Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose B: This is incorrect. Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: This is incorrect. Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose D: Students often confuse Tucker decomposition with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
Q7: When should you apply Tensor contraction?
A) Tensor contraction is not practically useful B) Apply Tensor contraction to solve problems in this subject's domain C) Avoid Tensor contraction unless explicitly instructed D) Use Tensor contraction only in pure mathematics contexts
Correct: B)
- If you chose A: This is incorrect. Tensor contraction is a practical tool used throughout this subject to solve relevant problems.
- If you chose B: Tensor contraction is a practical tool used throughout this subject to solve relevant problems. Correct!
- If you chose C: This is incorrect. Tensor contraction is a practical tool used throughout this subject to solve relevant problems.
- If you chose D: This is incorrect. Tensor contraction is a practical tool used throughout this subject to solve relevant problems.
Practice Problems
-
Compute the tensor product
u โ v โ wwhere $u = [1, 1]^T$, $v = [2, 0]^T$, $w = [1, -1]^T$. Give the 2ร2ร2 tensor entries. -
Verify that the Kronecker product is NOT commutative by computing
A โ BandB โ Afor $A = [1, 0]$ and $B = [0, 1]$ (as 1ร2 row vectors viewed as 1ร2 matrices). -
A 2ร2ร2 tensor
๐ณhas CP decompositiona โ b โ cwhere $a = [1, 2]^T$, $b = [3, 4]^T$, $c = [0, 5]^T$. What is๐ณ_{212}? -
Write the expression for the mode-2 product $๐ด = ๐ณ รโ M$ using the Einstein summation convention, where
๐ณis order-3 andMis a matrix. -
For the 2ร2ร2 tensor with entries $๐ณ_{111}=1, ๐ณ_{211}=2, ๐ณ_{121}=3, ๐ณ_{221}=4, ๐ณ_{112}=5, ๐ณ_{212}=6, ๐ณ_{122}=7, ๐ณ_{222}=8$, compute the mode-1 unfolding
X_{(1)}as a 2ร4 matrix.
Answers
1. ๐ณ_{ijk} = u_i ยท v_j ยท w_k. Front slice (k=1, wโ=1): [[1ยท2ยท1, 1ยท0ยท1], [1ยท2ยท1, 1ยท0ยท1]] = [[2, 0], [2, 0]] Back slice (k=2, wโ=-1): [[1ยท2ยท(-1), 1ยท0ยท(-1)], [1ยท2ยท(-1), 1ยท0ยท(-1)]] = [[-2, 0], [-2, 0]] So ๐ณ = [[[2,0],[2,0]], [[-2,0],[-2,0]]]. 2. A = [[1,0]] (1ร2), B = [[0,1]] (1ร2). AโB = [[0,1,0,0]] (1ร4). BโA = [[0,0,1,0]] (1ร4). Different! AโB โ BโA. 3. ๐ณ_{212} = aโ ยท bโ ยท cโ = 2 ยท 3 ยท 5 = 30. 4. ๐ด_{i, j, k} = ๐ณ_{i, p, k} ยท M_{j, p} (sum over p; p is the mode-2 index). Here the new index j comes from M (output dimension), and i, k are the unchanged mode-1 and mode-3 indices. 5. Mode-1 unfolding: mode-1 varies fastest, then mode-2, then mode-3. Columns correspond to (mode-2, mode-3) pairs: (1,1), (2,1), (1,2), (2,2). X_{(1)} = [[1, 3, 5, 7], [2, 4, 6, 8]]Summary
- Tensors generalize vectors (order 1) and matrices (order 2) to arbitrary order
d: multilinear maps represented as d-dimensional arrays - The tensor product โ builds higher-order tensors from lower-order ones:
(A โ B)_{iโ...i_p, jโ...j_q} = A_{iโ...i_p} B_{jโ...j_q} - CP decomposition factorizes a tensor as a sum of rank-1 tensors, generalizing the SVD of matrices
- Tucker decomposition uses a core tensor multiplied by factor matrices along each mode; HOSVD provides a computable approximation
- Einstein summation convention (
C_{ij} = A_{ik} B_{kj}) compactly expresses tensor contractions, critical for deep learning tensor operations
Pitfalls
- Confusing tensor order with tensor rank. Order is the number of modes/indices (a structural property). Rank is the minimum number of rank-1 terms needed to sum to the tensor (a decomposition property). A 3ร4ร5 tensor has order 3; its CP rank could be anything.
- Assuming tensor rank is bounded by mode dimension. Unlike matrices where rank โค min(m, n), tensor rank can exceed every individual mode dimension. Determining CP rank is NP-hard in general.
- Treating the Kronecker product as commutative. A โ B โ B โ A in general. The mixed-product property $(A โ B)(C โ D) = (AC) โ (BD)$ only holds when the inner dimensions match.
- Assuming the CP decomposition always exists for a given rank. The set of rank-R tensors is not closed for R > 1 (unlike matrices). The best rank-R approximation may not exist โ iterative algorithms can diverge ("degenerate" CP).
- Misapplying Einstein summation across incompatible index conventions. In standard physics convention, summation is over one upper and one lower index. In ML/data contexts where all indices are subscripts, repeated subscripts imply summation โ be consistent within each derivation.
Next Steps
Continue to 09-09 Numerical Linear Algebra to learn about floating-point errors, stability, sparse matrices, and iterative methods.