2.3. Matrix multiplication (math)
Matrix multiplication is the single most widely used op in ML frameworks. In MLSys, it is often abbreviated as matmul1.
First, let's do a quick recap of how matrix multiplication works mathematically, as you might recall from linear algebra.
We have two matrices A and B, and their multiplication C = AB is illustrated in Figure 3. The figure shows how the top-right element of C, 38, is computed as the inner product of the first row of A, [3, 4, 5] and the second column of B, [3, 1, 5].
For every cell of C, we do the same inner product computation with the corresponding row of A and column of B. For example, the bottom-right element of C, 17, is computed as the inner product of [2, 6, 1] and [3, 1, 5].
To compute the inner product, the two vectors have to have the same length. The two vectors in matmul are a row vector from A and a column vector from B. Since A has 3 numbers per row, B must have 3 numbers per column. Otherwise, they cannot be multiplied together.
Now, try to answer this question: How many inner products do we need to compute in total for a matmul op? It should be a simple combination of all the rows from A and all the columns from B, that is, m × n inner products, given that A has m rows and B has n columns. The result is an m × n matrix.