MLSYS ENGINEERING

2.3. Matrix multiplication (math)

Matrix multiplication is the single most widely used op in ML frameworks. In MLSys, it is often abbreviated as matmul1.

First, let's do a quick recap of how matrix multiplication works mathematically, as you might recall from linear algebra.

We have two matrices A and B, and their multiplication C = AB is illustrated in Figure 3. The figure shows how the top-right element of C, 38, is computed as the inner product of the first row of A, [3, 4, 5] and the second column of B, [3, 1, 5].

For every cell of C, we do the same inner product computation with the corresponding row of A and column of B. For example, the bottom-right element of C, 17, is computed as the inner product of [2, 6, 1] and [3, 1, 5].

C = AB: 54 38 50 17 A: 3 4 5 2 6 1 B: 2 3 7 1 4 5 38 = 3×3 + 4×1 + 5×5
Figure 3. Visual illustration of matmul.

To compute the inner product, the two vectors have to have the same length. The two vectors in matmul are a row vector from A and a column vector from B. Since A has 3 numbers per row, B must have 3 numbers per column. Otherwise, they cannot be multiplied together.

Now, try to answer this question: How many inner products do we need to compute in total for a matmul op? It should be a simple combination of all the rows from A and all the columns from B, that is, m × n inner products, given that A has m rows and B has n columns. The result is an m × n matrix.


1. We will use matmul as the abbreviation for matrix multiplication in the rest of the book.