Documentation
¶
Overview ¶
Package with multi-GPU primitives like array allocation, copying, ...
3D Array indexing.
Internal dimensions are labeled (I,J,K), I being the outermost dimension, K the innermost. A typical loop reads:
for i:=0; i<N0; i++{ for j:=0; j<N1; j++{ for k:=0; k<N2; k++{ ... } } }
I may be a small dimension, but K must preferentially be large and align-able in CUDA memory.
The underlying contiguous storage is indexed as:
index := i*N1*N2 + j*N2 + k
The "internal" (I,J,K) dimensions correspond to the "user" dimensions (Z,Y,X)! Z is typically the smallest dimension like the thickness.
Slicing the geometry over multiple GPUs ¶
In the J-direction.
Index ¶
- Constants
- Variables
- func Add(dst, a, b *Array)
- func AddMadd(dst, a, b, c *Array, mul float64)
- func BrillouinAsync(msat0 *Array, msat0T0 *Array, T *Array, Tc *Array, S *Array, msat0Mul float64, ...)
- func CMaddAsync(dst *Array, scale complex128, kern, src *Array, stream Stream)
- func CopyPad3D(dst, src *Array)
- func CopyPad3DAsync(dst, src *Array)
- func CopyUnPad3D(dst, src *Array)
- func CpAsync(cp *Array, T *Array, Td *Array, n *Array, TdMul float64, stream Stream)
- func Div(dst, a, b *Array)
- func DivMulPow(dst, a, b, c *Array, p float64)
- func Dot(dst, a, b *Array)
- func DotMask(dst, a, b *Array, aMul, bMul []float64)
- func DotSign(dst, a, b, c *Array)
- func EnergyFlowAsync(w *Array, m *Array, R *Array, Tc *Array, S *Array, n *Array, SMul float64, ...)
- func Exchange6Async(h, m, msat0T0, lex *Array, msat0T0Mul float64, lexMul2_cellSize2 []float64, ...)
- func FFTNormLogic(logicSize []int) int
- func FFTOutputSize(logicSize []int) []int
- func InitGPU(device int, flags uint)
- func KappaAsync(kappa *Array, msat0 *Array, msat0T0 *Array, T *Array, Tc *Array, S *Array, ...)
- func LLBarLocal00NC(t *Array, h *Array, msat0T0 *Array, lambda *Array, lambdaMul []float64)
- func LLBarLocal02C(t *Array, m *Array, h *Array, msat0T0 *Array, mu *Array, muMul []float64)
- func LLBarLocal02NC(t *Array, m *Array, h *Array, msat0T0 *Array, mu *Array, muMul []float64)
- func LLBarNonlocal00NC(t *Array, h *Array, msat0T0 *Array, lambda_e *Array, lambda_eMul []float64, ...)
- func LLBarTorqueAsync(t *Array, M *Array, h *Array, msat0T0 *Array)
- func LinearCombination2Async(dst *Array, a *Array, mulA float64, b *Array, mulB float64, stream Stream)
- func LinearCombination3(dst *Array, a *Array, mulA float64, b *Array, mulB float64, c *Array, ...)
- func LinearCombination3Async(dst *Array, a *Array, mulA float64, b *Array, mulB float64, c *Array, ...)
- func LongFieldAsync(hlf *Array, m *Array, msat0T0 *Array, J *Array, n *Array, Tc *Array, Ts *Array, ...)
- func MAdd1Async(a, b *Array, mulB float64, stream Stream)
- func MAdd2Async(a, b *Array, mulB float64, c *Array, mulC float64, stream Stream)
- func Madd(dst, a, b *Array, mulB float64)
- func Mul(dst, a, b *Array)
- func Normalize(m *Array)
- func PartialMax(in, out *Array, blocks, threadsPerBlock, N int)
- func PartialMaxAbs(in, out *Array, blocks, threadsPerBlock, N int)
- func PartialMaxDiff(a, b, out *Array, blocks, threadsPerBlock, N int)
- func PartialMaxNorm3Sq(x, y, z, out *Array, blocks, threadsPerBlock, N int)
- func PartialMaxNorm3SqDiff(x1, y1, z1, x2, y2, z2, out *Array, blocks, threadsPerBlock, N int)
- func PartialMaxSum(a, b, out *Array, blocks, threadsPerBlock, N int)
- func PartialMin(in, out *Array, blocks, threadsPerBlock, N int)
- func PartialSDot(in1, in2, out *Array, blocks, threadsPerBlock, N int)
- func PartialSum(in, out *Array, blocks, threadsPerBlock, N int)
- func Qinter_async(Qi *Array, Ti *Array, Tj *Array, Gij *Array, GijMul []float64, stream Stream)
- func Qspat_async(Q *Array, T *Array, k *Array, kMul []float64, cs []float64, pbc []int)
- func ScaleNoiseAniz(h, mu, T, msat0T0 *Array, muMul []float64, ...)
- func SetDefaultFFT(name string)
- func TensSYMMVecMul(...)
- func TsSync(Ts *Array, msat *Array, msat0T0 *Array, Tc *Array, S *Array, msatMul float64, ...)
- func UniaxialAnisotropyAsync(h, m *Array, KuMask, MsatMask *Array, Ku2_Mu0MSat float64, anisUMask *Array, ...)
- func VecMadd(dst, a, b *Array, mulB []float64)
- func WeightedAverage(dst, x0, x1, w0, w1, R *Array, w0Mul, w1Mul, RMul float64)
- func ZeroArrayAsync(A *Array, stream Stream)
- type Array
- func (a *Array) Alloc()
- func (a *Array) Assign(other *Array)
- func (a *Array) Component(i int) *Array
- func (dst *Array) CopyFromDevice(src *Array)
- func (dst *Array) CopyFromHost(src *host.Array)
- func (src *Array) CopyToHost(dst *host.Array)
- func (a *Array) DevicePtr() cu.DevicePtr
- func (v *Array) Free()
- func (b *Array) Get(comp, x, y, z int) float64
- func (a *Array) Init(components int, size3D []int, alloc bool)
- func (a *Array) IsNil() bool
- func (a *Array) Len() int
- func (src *Array) LocalCopy() *host.Array
- func (a *Array) NComp() int
- func (a *Array) PartLen3D() int
- func (a *Array) PartLen4D() int
- func (a *Array) PartSize() []int
- func (shared *Array) PointTo(original *Array, offset int)
- func (a *Array) Pointer() cu.DevicePtr
- func (b *Array) Set(comp, x, y, z int, value float64)
- func (a *Array) Size3D() []int
- func (a *Array) Size4D() []int
- func (a *Array) String() string
- func (a *Array) Zero()
- type FFTInterface
- type FFTPlanX
- type Reductor
- func (r *Reductor) Dot(in1, in2 *Array) float64
- func (r *Reductor) Free()
- func (r *Reductor) Init(nComp int, size []int)
- func (r *Reductor) Max(in *Array) float64
- func (r *Reductor) MaxAbs(in *Array) float64
- func (r *Reductor) MaxDiff(a, b *Array) float64
- func (r *Reductor) MaxNorm(a *Array) float64
- func (r *Reductor) MaxNormDiff(a, b *Array) float64
- func (r *Reductor) MaxSum(a, b *Array) float64
- func (r *Reductor) Min(in *Array) float64
- func (r *Reductor) Sum(in *Array) float64
- type Stream
Constants ¶
const ( DO_ALLOC = true DONT_ALLOC = false )
Parameters for Array.Init()
const ( MSG_BADDEVICEID = "Invalid device ID: " MSG_DEVICEUNINITIATED = "Device list not initiated" )
Error message
const ERR_UNIFIED_ADDR = "A GPU does not support unified addressing and can not be used in a multi-GPU setup."
Error message
const MSG_ARRAY_SIZE_MISMATCH = "array size mismatch"
Error message.
Variables ¶
var NewDefaultFFT func(dataSize, logicSize []int) FFTInterface = NewFFTPlanX // this default is for tests, not sims.
The default FFT constructor. The function pointer may be changed to use a different FFT implementation globally.
Functions ¶
func BrillouinAsync ¶
func CMaddAsync ¶
func CMaddAsync(dst *Array, scale complex128, kern, src *Array, stream Stream)
Complex multiply add. dst and src contain complex numbers (interleaved format) kern contains real numbers
dst[i] += scale * kern[i] * src[i]
func CopyPad3D ¶
func CopyPad3D(dst, src *Array)
Padding of a 3D matrix -> only to be used when Ndev=1 Copy from src to dst, which have different size3D. If dst is smaller, the src input is cropped to the right size. If dst is larger, the src input is padded with zeros to the right size.
func CopyPad3DAsync ¶
func CopyPad3DAsync(dst, src *Array)
func CopyUnPad3D ¶
func CopyUnPad3D(dst, src *Array)
func Div ¶
func Div(dst, a, b *Array)
Divide 2 multi-GPU arrays: dst = a / b; _if_ b = 0 _then_ dst = 0
func DivMulPow ¶
Divide and Multiply by the array raised to the Power : dst = pow(c, p) * a / b; _if_ b = 0 _then_ dst = a, _if_c = 0 _then_ dst = 0
func Dot ¶
func Dot(dst, a, b *Array)
Synchronous Dot product: C = AiBi, A and B should not be masks
func DotSign ¶
func DotSign(dst, a, b, c *Array)
Synchronous Singed Dot product: C = sign(BC) * (AB)
func EnergyFlowAsync ¶
func Exchange6Async ¶
func Exchange6Async(h, m, msat0T0, lex *Array, msat0T0Mul float64, lexMul2_cellSize2 []float64, periodic []int, stream Stream)
6-neighbor exchange field. Aex2_mu0Msatmul: 2 * Aex / Mu0 * Msat.multiplier
func FFTNormLogic ¶
Returns the normalization factor of an FFT with this logic size. (just the product of the sizes)
func FFTOutputSize ¶
Returns the output size of an FFT with given logic size.
func KappaAsync ¶
func LLBarLocal00NC ¶
func LLBarLocal02C ¶
func LLBarLocal02NC ¶
func LLBarNonlocal00NC ¶
func LinearCombination2Async ¶
func LinearCombination2Async(dst *Array, a *Array, mulA float64, b *Array, mulB float64, stream Stream)
dst[i] = a[i]*mulA + b[i]*mulB
func LinearCombination3 ¶
func LinearCombination3(dst *Array, a *Array, mulA float64, b *Array, mulB float64, c *Array, mulC float64)
dst[i] = a[i]*mulA + b[i]*mulB + c[i]*mulC
func LinearCombination3Async ¶
func LinearCombination3Async(dst *Array, a *Array, mulA float64, b *Array, mulB float64, c *Array, mulC float64, stream Stream)
dst[i] = a[i]*mulA + b[i]*mulB + c[i]*mulC
func LongFieldAsync ¶
func MAdd1Async ¶
Asynchronous multiply-add: a += mulB*b b may contain NULL pointers, implemented as all 1's.
func MAdd2Async ¶
Asynchronous multiply-add: a += mulB*b + mulC*c b,c may contain NULL pointers, implemented as all 1's.
func PartialMax ¶
Partial maxima (see reduce.h)
func PartialMaxAbs ¶
Partial maxima of absolute values (see reduce.h)
func PartialMaxDiff ¶
Partial maximum difference between arrays (see reduce.h)
func PartialMaxNorm3Sq ¶
Partial maximum of Euclidian norm squared (see reduce.h)
func PartialMaxNorm3SqDiff ¶
Partial maximum of Euclidian norm squared of difference between two 3-vector arrays(see reduce.h)
func PartialMaxSum ¶
Partial maximum difference between arrays (see reduce.h)
func PartialMin ¶
Partial minima (see reduce.h)
func PartialSDot ¶
Partial dot products (see reduce.h)
func PartialSum ¶
Partial sums (see reduce.h)
func Qinter_async ¶
func Qspat_async ¶
func ScaleNoiseAniz ¶
func TensSYMMVecMul ¶
func UniaxialAnisotropyAsync ¶
func UniaxialAnisotropyAsync(h, m *Array, KuMask, MsatMask *Array, Ku2_Mu0MSat float64, anisUMask *Array, anisUMul []float64, stream Stream)
Computes the uniaxial anisotropy field, stores in h.
func VecMadd ¶
3-vector multiply-add: dst_i = a_i + mulB_i*b_i b may contain NULL pointers, implemented as all 1's.
func WeightedAverage ¶
func ZeroArrayAsync ¶
Types ¶
type Array ¶
type Array struct { Stream // GPU stream for general use with this array Comp []Array // X,Y,Z components as arrays // contains filtered or unexported fields }
A MuMax Array represents a 3-dimensional array of N-vectors.
Layout example for a (3,4) vsplice on 2 GPUs:
GPU0: X0 X1 Y0 Y1 Z0 Z1 GPU1: X2 X3 Y2 Y3 Z2 Z3
func NilArray ¶
Returns an array without underlying storage. This is used for space-independent quantities. These pass a multiplier value and a null pointer for each GPU. A NilArray already has null pointers for each GPU set, so it is more convenient than just a nil pointer of type *Array. See: Alloc()
func (*Array) Alloc ¶
func (a *Array) Alloc()
If the array has no underlying storage yet (e.g., it was created by NilArray()), allocate that storage.
func (*Array) Component ¶
Gets the i'th component as an array. E.g.: Component(0) is the x-component.
func (*Array) CopyFromDevice ¶
Copy from device array to device array.
func (*Array) CopyFromHost ¶
Copy from host array to device array.
func (*Array) CopyToHost ¶
Copy from device array to host array.
func (*Array) Init ¶
Initializes the array to hold a field with the number of components and given size.
Init(3, 1000) // gives an array of 1000 3-vectors Init(1, 1000) // gives an array of 1000 scalars Init(6, 1000) // gives an array of 1000 6-vectors or symmetric tensors
Storage is allocated only if alloc == true.
func (*Array) IsNil ¶
True if the array has no underlying GPU storage. E.g., when created by NilArray()
func (*Array) PointTo ¶
Lets the pointers of an already initialized, but not allocated array (shared) point to an allocated array (original) possibly with an offset.
type FFTInterface ¶
Interface for any sparse FFT plan.
func NewFFTPlanX ¶
func NewFFTPlanX(dataSize, logicSize []int) FFTInterface
type Reductor ¶
type Reductor struct { N int // contains filtered or unexported fields }
A Reductor stores the necessary buffers to reduce data on the multi-GPU. It can be used to sum data, take minima, maxima, etc...
func NewReductor ¶
Make reductor to reduce an array of given size
func (*Reductor) MaxNormDiff ¶
Takes the maximum norm of the difference between two 3-component (vector) arrays.
Source Files
¶
- array.go
- brillouin-wrapper.go
- doc.go
- fftinterface.go
- fftplanX.go
- kappa_wrapper.go
- libhotspin.go
- long_field_wrapper.go
- multigpu.go
- qinter-wrapper.go
- qspat-wrapper.go
- reductor.go
- stream.go
- temperature_wrapper.go
- wrap_Ts.go
- wrap_cp.go
- wrap_energy-flow.go
- wrap_llbar-local00nc.go
- wrap_llbar-local02c.go
- wrap_llbar-local02nc.go
- wrap_llbar-nonlocal00nc.go
- wrap_llbar-torque.go