Documentation ¶
Overview ¶
Package mat is a golang matrix package with cache-aware lock-free tiling optimization.
The following contents explains how multiplication algorithm works.
Prior knowledge ¶
Assume only 2 levels in the hierarchy, fast(registers/cache) and slow(main memory). All data initially in slow memory
m: number of memory elements (words) moved between fast and slow memory t_m: time per slow memory operation f: number of arithemetic operations t_f: time per arithmetic operation (t_f << t_m) q = f/m: computational intensity (key to algorithm efficiency) average number of flops per slow memory access Minimum possible time = f * t_f when all data in fast memory. Actual time = f * t_f + m * t_m = f * t_f * [1 + (t_m / t_f) * (1 / q)] Machine balance a = t_m / t_f (key to machine efficiency)
Larger q means time closer to minimum f*t_f
q >= t_m / t_f needed to get at least half of peak speed
Tiled matrix multiply
C = A·B = / \ / \ / \ | A11 A12 | | B11 B12 | | A11·B11+A12·B21 A11·B12+A12·B22 | | | | | == | | | A21 A22 | | B21 B22 | | A21·B11+A22·B22 A21·B12+A22·B22 | \ / \ / \ /
Consider A, B, C to be N-by-N matrices of b-by-b sub-blocks:
b = n / N is called the *block size*
Thus
m = N*n*n read each block of B N*N*N times (N*N*N * b*b = N*N*N * (n/N)*(n/N) = N*n*n) + N*n*n read each block of A N*N*N times + 2*n*n read and write each block of C once = 2*(N+1)*n*n
Hence, computational intensity q = f/m = 2*n*n*n / (2*(N+1)*n*n)
q ~= n/N = b (for large n)
So we can improve performance by increasing the blocksize b Can be much faster than matrix-vector multiply (q=2)
The blocked algorithm has computational intensity q ~= b: the large the block size, the more efficient algorithm will be
Limit: All three blocks from A, B, C must fit in fast memory (cache), so we cannot make these blocks arbitrarily large.
Assume our fast memory has size M_fast:
3*b*b <= M_fast, so q ~= b <= sqrt(M_fast / 3)
To build a machine to run matrix multiply at 1/2 peak arthmetic speed of the machine, we need a fast memory of size
M_fast >= 3*b*b ~= 3*q*q = 3*(t_m / t_f)*(t_m / t_f)
This size is reasonable for L1 cache, but not for register sets Note: analysis assumes it is possible to schedule the instructions perfectly.
Limits ¶
The tiled algorithm changes the order in which values are accumulated into each C[i, j] by applying commutativity and associativity: Get slightly different answers from naive version, because of roundoff.
The previous anlysis showed that the blocked algorithm has computational intensity:
q ~= b <= sqrt(M_fast / 3)
There is a lower bound result that says we cannot do any better than this (using only associativity):
Theorem (Hong & Kong, 1981): Any reorganization of this algorithm (that uses only associativity) is limited to:
q = O(sqrt(M_fast))
Index ¶
- Variables
- func NewDense(m, n int) func(...float64) (*Dense, error)
- func NewDenseP(m, n int) func(...float64) (*Dense, error)
- type Dense
- func (A *Dense) Add(B Matrix) error
- func (A *Dense) At(i, j int) float64
- func (A *Dense) Col() int
- func (A *Dense) Dot(B, C Matrix) (err error)
- func (A *Dense) DotBlock(B, C Matrix) (err error)
- func (A *Dense) DotBlockIJK(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockIJKP(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockIKJ(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockIKJP(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockJIK(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockJIKP(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockJKI(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockJKIP(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockKIJ(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockKIJP(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockKJI(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockKJIP(blockSize int, B, C Matrix) (err error)
- func (A *Dense) DotBlockP(B, C Matrix) (err error)
- func (A *Dense) DotNaive(B, C Matrix) (err error)
- func (A *Dense) DotNaiveIJK(B, C Matrix) (err error)
- func (A *Dense) DotNaiveIJKP(B, C Matrix) (err error)
- func (A *Dense) DotNaiveIKJ(B, C Matrix) (err error)
- func (A *Dense) DotNaiveIKJP(B, C Matrix) (err error)
- func (A *Dense) DotNaiveJIK(B, C Matrix) (err error)
- func (A *Dense) DotNaiveJIKP(B, C Matrix) (err error)
- func (A *Dense) DotNaiveJKI(B, C Matrix) (err error)
- func (A *Dense) DotNaiveJKIP(B, C Matrix) (err error)
- func (A *Dense) DotNaiveKIJ(B, C Matrix) (err error)
- func (A *Dense) DotNaiveKIJP(B, C Matrix) (err error)
- func (A *Dense) DotNaiveKJI(B, C Matrix) (err error)
- func (A *Dense) DotNaiveKJIP(B, C Matrix) (err error)
- func (A *Dense) DotNaiveP(B, C Matrix) (err error)
- func (A *Dense) DotP(B, C Matrix) (err error)
- func (A *Dense) Equal(B Matrix) bool
- func (A *Dense) EqualShape(B Matrix) bool
- func (A *Dense) Inc(i, j int, val float64)
- func (A *Dense) Mult(i, j int, val float64)
- func (A *Dense) Pow(i, j int, n float64)
- func (A *Dense) Print()
- func (A *Dense) Row() int
- func (A *Dense) Set(i, j int, val float64)
- func (A *Dense) Size() (int, int)
- type Matrix
Constants ¶
This section is empty.
Variables ¶
var ( ErrNumElem = errors.New("bad number of elements") ErrMatSize = errors.New("bad size of matrix") )
Errors
Functions ¶
Types ¶
type Dense ¶
type Dense struct {
// contains filtered or unexported fields
}
Dense implements Matrix interface dense matrix underlying data struct
func (*Dense) DotBlock ¶
DotBlock matrix multiplication Use JIK block 36 version here, see ./benchmark/README.md
func (*Dense) DotBlockIJK ¶
DotBlockIJK block matrix multiplication
func (*Dense) DotBlockIJKP ¶
DotBlockIJKP block matrix multiplication
func (*Dense) DotBlockIKJ ¶
DotBlockIKJ block matrix multiplication
func (*Dense) DotBlockIKJP ¶
DotBlockIKJP block matrix multiplication
func (*Dense) DotBlockJIK ¶
DotBlockJIK block matrix multiplication
func (*Dense) DotBlockJIKP ¶
DotBlockJIKP block matrix multiplication
func (*Dense) DotBlockJKI ¶
DotBlockJKI block matrix multiplication
func (*Dense) DotBlockJKIP ¶
DotBlockJKIP block matrix multiplication
func (*Dense) DotBlockKIJ ¶
DotBlockKIJ block matrix multiplication
func (*Dense) DotBlockKIJP ¶
DotBlockKIJP block matrix multiplication
func (*Dense) DotBlockKJI ¶
DotBlockKJI block matrix multiplication
func (*Dense) DotBlockKJIP ¶
DotBlockKJIP block matrix multiplication
func (*Dense) DotBlockP ¶
DotBlockP matrix multiplication Use JIKP block 36 version here, see ./benchmark/README.md
func (*Dense) DotNaive ¶
DotNaive matrix multiplication O(n^3) Use JIK version here, see ./benchmark/README.md
func (*Dense) DotNaiveIJK ¶
DotNaiveIJK matrix multiplication O(n^3)
func (*Dense) DotNaiveIJKP ¶
DotNaiveIJKP matrix multiplication O(n^3)
func (*Dense) DotNaiveIKJ ¶
DotNaiveIKJ matrix multiplication O(n^3)
func (*Dense) DotNaiveIKJP ¶
DotNaiveIKJP matrix multiplication O(n^3)
func (*Dense) DotNaiveJIK ¶
DotNaiveJIK matrix multiplication O(n^3)
func (*Dense) DotNaiveJIKP ¶
DotNaiveJIKP matrix multiplication O(n^3)
func (*Dense) DotNaiveJKI ¶
DotNaiveJKI matrix multiplication O(n^3)
func (*Dense) DotNaiveJKIP ¶
DotNaiveJKIP matrix multiplication O(n^3)
func (*Dense) DotNaiveKIJ ¶
DotNaiveKIJ matrix multiplication O(n^3)
func (*Dense) DotNaiveKIJP ¶
DotNaiveKIJP matrix multiplication O(n^3)
func (*Dense) DotNaiveKJI ¶
DotNaiveKJI matrix multiplication O(n^3)
func (*Dense) DotNaiveKJIP ¶
DotNaiveKJIP matrix multiplication O(n^3)
func (*Dense) DotNaiveP ¶
DotNaiveP matrix multiplication O(n^3) Use JIKP version here, see ./benchmark/README.md
func (*Dense) EqualShape ¶
EqualShape check A.Size() == B.Size()
Source Files ¶
- doc.go
- interface.go
- mat.go
- matadd.go
- mateigen.go
- matmult.go
- matmult_block.go
- matmult_block_ijk.go
- matmult_block_ikj.go
- matmult_block_jik.go
- matmult_block_jki.go
- matmult_block_kij.go
- matmult_block_kji.go
- matmult_naive.go
- matmult_naive_ijk.go
- matmult_naive_ikj.go
- matmult_naive_jik.go
- matmult_naive_jki.go
- matmult_naive_kij.go
- matmult_naive_kji.go