gpu

package
v0.0.0-...-a20b597 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2015 License: GPL-3.0 Imports: 9 Imported by: 0

Documentation

Overview

Package with multi-GPU primitives like array allocation, copying, ...

3D Array indexing.

Internal dimensions are labeled (I,J,K), I being the outermost dimension, K the innermost. A typical loop reads:

for i:=0; i<N0; i++{
	for j:=0; j<N1; j++{
		for k:=0; k<N2; k++{
			...
		}
	}
}

I may be a small dimension, but K must preferentially be large and align-able in CUDA memory.

The underlying contiguous storage is indexed as:

index := i*N1*N2 + j*N2 + k

The "internal" (I,J,K) dimensions correspond to the "user" dimensions (Z,Y,X)! Z is typically the smallest dimension like the thickness.

Slicing the geometry over multiple GPUs

In the J-direction.

Index

Constants

View Source
const (
	DO_ALLOC   = true
	DONT_ALLOC = false
)

Parameters for Array.Init()

View Source
const (
	MSG_BADDEVICEID       = "Invalid device ID: "
	MSG_DEVICEUNINITIATED = "Device list not initiated"
)

Error message

View Source
const ERR_UNIFIED_ADDR = "A GPU does not support unified addressing and can not be used in a multi-GPU setup."

Error message

View Source
const MSG_ARRAY_SIZE_MISMATCH = "array size mismatch"

Error message.

Variables

View Source
var NewDefaultFFT func(dataSize, logicSize []int) FFTInterface = NewFFTPlanX // this default is for tests, not sims.

The default FFT constructor. The function pointer may be changed to use a different FFT implementation globally.

Functions

func Add

func Add(dst, a, b *Array)

Adds 2 multi-GPU arrays: dst = a + b

func AddMadd

func AddMadd(dst, a, b, c *Array, mul float64)

Multiply-add: dst = a + mul* (b + c) b may NOT contain NULL pointers!

func BrillouinAsync

func BrillouinAsync(msat0 *Array, msat0T0 *Array, T *Array, Tc *Array, S *Array, msat0Mul float64, msat0T0Mul float64, TcMul float64, SMul float64, stream Stream)

func CMaddAsync

func CMaddAsync(dst *Array, scale complex128, kern, src *Array, stream Stream)

Complex multiply add. dst and src contain complex numbers (interleaved format) kern contains real numbers

dst[i] += scale * kern[i] * src[i]

func CopyPad3D

func CopyPad3D(dst, src *Array)

Padding of a 3D matrix -> only to be used when Ndev=1 Copy from src to dst, which have different size3D. If dst is smaller, the src input is cropped to the right size. If dst is larger, the src input is padded with zeros to the right size.

func CopyPad3DAsync

func CopyPad3DAsync(dst, src *Array)

func CopyUnPad3D

func CopyUnPad3D(dst, src *Array)

func CpAsync

func CpAsync(cp *Array, T *Array, Td *Array, n *Array, TdMul float64, stream Stream)

func Div

func Div(dst, a, b *Array)

Divide 2 multi-GPU arrays: dst = a / b; _if_ b = 0 _then_ dst = 0

func DivMulPow

func DivMulPow(dst, a, b, c *Array, p float64)

Divide and Multiply by the array raised to the Power : dst = pow(c, p) * a / b; _if_ b = 0 _then_ dst = a, _if_c = 0 _then_ dst = 0

func Dot

func Dot(dst, a, b *Array)

Synchronous Dot product: C = AiBi, A and B should not be masks

func DotMask

func DotMask(dst, a, b *Array, aMul, bMul []float64)

Synchronous Dot product: C = AiBi, A and B could be masks

func DotSign

func DotSign(dst, a, b, c *Array)

Synchronous Singed Dot product: C = sign(BC) * (AB)

func EnergyFlowAsync

func EnergyFlowAsync(w *Array, m *Array, R *Array, Tc *Array, S *Array, n *Array, SMul float64, stream Stream)

func Exchange6Async

func Exchange6Async(h, m, msat0T0, lex *Array, msat0T0Mul float64, lexMul2_cellSize2 []float64, periodic []int, stream Stream)

6-neighbor exchange field. Aex2_mu0Msatmul: 2 * Aex / Mu0 * Msat.multiplier

func FFTNormLogic

func FFTNormLogic(logicSize []int) int

Returns the normalization factor of an FFT with this logic size. (just the product of the sizes)

func FFTOutputSize

func FFTOutputSize(logicSize []int) []int

Returns the output size of an FFT with given logic size.

func InitGPU

func InitGPU(device int, flags uint)

Sets a list of devices to use.

func KappaAsync

func KappaAsync(kappa *Array, msat0 *Array, msat0T0 *Array, T *Array, Tc *Array, S *Array, n *Array, msat0Mul float64, msat0T0Mul float64, TcMul float64, SMul float64, stream Stream)

func LLBarLocal00NC

func LLBarLocal00NC(t *Array, h *Array, msat0T0 *Array, lambda *Array, lambdaMul []float64)

func LLBarLocal02C

func LLBarLocal02C(t *Array, m *Array, h *Array, msat0T0 *Array, mu *Array, muMul []float64)

func LLBarLocal02NC

func LLBarLocal02NC(t *Array, m *Array, h *Array, msat0T0 *Array, mu *Array, muMul []float64)

func LLBarNonlocal00NC

func LLBarNonlocal00NC(t *Array, h *Array, msat0T0 *Array, lambda_e *Array, lambda_eMul []float64, cellsizeX float64, cellsizeY float64, cellsizeZ float64, pbc []int)

func LLBarTorqueAsync

func LLBarTorqueAsync(t *Array, M *Array, h *Array, msat0T0 *Array)

func LinearCombination2Async

func LinearCombination2Async(dst *Array, a *Array, mulA float64, b *Array, mulB float64, stream Stream)

dst[i] = a[i]*mulA + b[i]*mulB

func LinearCombination3

func LinearCombination3(dst *Array, a *Array, mulA float64, b *Array, mulB float64, c *Array, mulC float64)

dst[i] = a[i]*mulA + b[i]*mulB + c[i]*mulC

func LinearCombination3Async

func LinearCombination3Async(dst *Array, a *Array, mulA float64, b *Array, mulB float64, c *Array, mulC float64, stream Stream)

dst[i] = a[i]*mulA + b[i]*mulB + c[i]*mulC

func LongFieldAsync

func LongFieldAsync(hlf *Array, m *Array, msat0T0 *Array, J *Array, n *Array, Tc *Array, Ts *Array, msat0T0Mul float64, JMul float64, nMul float64, TcMul float64, TsMul float64, stream Stream)

func MAdd1Async

func MAdd1Async(a, b *Array, mulB float64, stream Stream)

Asynchronous multiply-add: a += mulB*b b may contain NULL pointers, implemented as all 1's.

func MAdd2Async

func MAdd2Async(a, b *Array, mulB float64, c *Array, mulC float64, stream Stream)

Asynchronous multiply-add: a += mulB*b + mulC*c b,c may contain NULL pointers, implemented as all 1's.

func Madd

func Madd(dst, a, b *Array, mulB float64)

func Mul

func Mul(dst, a, b *Array)

Multiply 2 multi-GPU arrays: dst = a * b

func Normalize

func Normalize(m *Array)

Normalize

func PartialMax

func PartialMax(in, out *Array, blocks, threadsPerBlock, N int)

Partial maxima (see reduce.h)

func PartialMaxAbs

func PartialMaxAbs(in, out *Array, blocks, threadsPerBlock, N int)

Partial maxima of absolute values (see reduce.h)

func PartialMaxDiff

func PartialMaxDiff(a, b, out *Array, blocks, threadsPerBlock, N int)

Partial maximum difference between arrays (see reduce.h)

func PartialMaxNorm3Sq

func PartialMaxNorm3Sq(x, y, z, out *Array, blocks, threadsPerBlock, N int)

Partial maximum of Euclidian norm squared (see reduce.h)

func PartialMaxNorm3SqDiff

func PartialMaxNorm3SqDiff(x1, y1, z1, x2, y2, z2, out *Array, blocks, threadsPerBlock, N int)

Partial maximum of Euclidian norm squared of difference between two 3-vector arrays(see reduce.h)

func PartialMaxSum

func PartialMaxSum(a, b, out *Array, blocks, threadsPerBlock, N int)

Partial maximum difference between arrays (see reduce.h)

func PartialMin

func PartialMin(in, out *Array, blocks, threadsPerBlock, N int)

Partial minima (see reduce.h)

func PartialSDot

func PartialSDot(in1, in2, out *Array, blocks, threadsPerBlock, N int)

Partial dot products (see reduce.h)

func PartialSum

func PartialSum(in, out *Array, blocks, threadsPerBlock, N int)

Partial sums (see reduce.h)

func Qinter_async

func Qinter_async(Qi *Array, Ti *Array, Tj *Array, Gij *Array, GijMul []float64, stream Stream)

func Qspat_async

func Qspat_async(Q *Array, T *Array, k *Array, kMul []float64, cs []float64, pbc []int)

func ScaleNoiseAniz

func ScaleNoiseAniz(h, mu, T, msat0T0 *Array,
	muMul []float64,
	KB2tempMul_mu0VgammaDtMsatMul float64)

func SetDefaultFFT

func SetDefaultFFT(name string)

Sets a global default FFT

func TensSYMMVecMul

func TensSYMMVecMul(dstX, dstY, dstZ, srcX, srcY, srcZ, kernXX, kernYY, kernZZ, kernYZ, kernXZ, kernXY *Array,
	srcMul float64,
	Nx, Ny, Nz int, stream Stream)

func TsSync

func TsSync(Ts *Array, msat *Array, msat0T0 *Array, Tc *Array, S *Array, msatMul float64, msat0T0Mul float64, TcMul float64, SMul float64)

func UniaxialAnisotropyAsync

func UniaxialAnisotropyAsync(h, m *Array, KuMask, MsatMask *Array, Ku2_Mu0MSat float64, anisUMask *Array, anisUMul []float64, stream Stream)

Computes the uniaxial anisotropy field, stores in h.

func VecMadd

func VecMadd(dst, a, b *Array, mulB []float64)

3-vector multiply-add: dst_i = a_i + mulB_i*b_i b may contain NULL pointers, implemented as all 1's.

func WeightedAverage

func WeightedAverage(dst, x0, x1, w0, w1, R *Array, w0Mul, w1Mul, RMul float64)

func ZeroArrayAsync

func ZeroArrayAsync(A *Array, stream Stream)

Types

type Array

type Array struct {
	Stream         // GPU stream for general use with this array
	Comp   []Array // X,Y,Z components as arrays
	// contains filtered or unexported fields
}

A MuMax Array represents a 3-dimensional array of N-vectors.

Layout example for a (3,4) vsplice on 2 GPUs:

GPU0: X0 X1  Y0 Y1 Z0 Z1
GPU1: X2 X3  Y2 Y3 Z2 Z3

func NewArray

func NewArray(components int, size3D []int) *Array

Returns an array which holds a field with the number of components and given size.

func NilArray

func NilArray(components int, size3D []int) *Array

Returns an array without underlying storage. This is used for space-independent quantities. These pass a multiplier value and a null pointer for each GPU. A NilArray already has null pointers for each GPU set, so it is more convenient than just a nil pointer of type *Array. See: Alloc()

func (*Array) Alloc

func (a *Array) Alloc()

If the array has no underlying storage yet (e.g., it was created by NilArray()), allocate that storage.

func (*Array) Assign

func (a *Array) Assign(other *Array)

a = other (accessible from packages where Array is not assignable)

func (*Array) Component

func (a *Array) Component(i int) *Array

Gets the i'th component as an array. E.g.: Component(0) is the x-component.

func (*Array) CopyFromDevice

func (dst *Array) CopyFromDevice(src *Array)

Copy from device array to device array.

func (*Array) CopyFromHost

func (dst *Array) CopyFromHost(src *host.Array)

Copy from host array to device array.

func (*Array) CopyToHost

func (src *Array) CopyToHost(dst *host.Array)

Copy from device array to host array.

func (*Array) DevicePtr

func (a *Array) DevicePtr() cu.DevicePtr

Address of part of the array on each GPU device

func (*Array) Free

func (v *Array) Free()

Frees the underlying storage and sets the size to zero.

func (*Array) Get

func (b *Array) Get(comp, x, y, z int) float64

Get a single value

func (*Array) Init

func (a *Array) Init(components int, size3D []int, alloc bool)

Initializes the array to hold a field with the number of components and given size.

Init(3, 1000) // gives an array of 1000 3-vectors
Init(1, 1000) // gives an array of 1000 scalars
Init(6, 1000) // gives an array of 1000 6-vectors or symmetric tensors

Storage is allocated only if alloc == true.

func (*Array) IsNil

func (a *Array) IsNil() bool

True if the array has no underlying GPU storage. E.g., when created by NilArray()

func (*Array) Len

func (a *Array) Len() int

Total number of elements

func (*Array) LocalCopy

func (src *Array) LocalCopy() *host.Array

DEBUG: Make a freshly allocated copy on the host.

func (*Array) NComp

func (a *Array) NComp() int

Number of components (1: scalar, 3: vector, ...).

func (*Array) PartLen3D

func (a *Array) PartLen3D() int

Number of elements per component per GPU

func (*Array) PartLen4D

func (a *Array) PartLen4D() int

Total number of elements per GPU

func (*Array) PartSize

func (a *Array) PartSize() []int

Size of each part per GPU

func (*Array) PointTo

func (shared *Array) PointTo(original *Array, offset int)

Lets the pointers of an already initialized, but not allocated array (shared) point to an allocated array (original) possibly with an offset.

func (*Array) Pointer

func (a *Array) Pointer() cu.DevicePtr

Array of pointers to parts, one per GPU.

func (*Array) Set

func (b *Array) Set(comp, x, y, z int, value float64)

Set a single value

func (*Array) Size3D

func (a *Array) Size3D() []int

Size of the vector field.

func (*Array) Size4D

func (a *Array) Size4D() []int

Number of components + size of the vector field.

func (*Array) String

func (a *Array) String() string

Human-readable string.

func (*Array) Zero

func (a *Array) Zero()

Makes all elements zero.

type FFTInterface

type FFTInterface interface {
	Forward(in, out *Array)
	Inverse(in, out *Array)
	Free()
}

Interface for any sparse FFT plan.

func NewFFTPlanX

func NewFFTPlanX(dataSize, logicSize []int) FFTInterface

type FFTPlanX

type FFTPlanX struct {
	Stream
	// contains filtered or unexported fields
}

func (*FFTPlanX) Forward

func (fft *FFTPlanX) Forward(in, out *Array)

func (*FFTPlanX) Free

func (fft *FFTPlanX) Free()

func (*FFTPlanX) Inverse

func (fft *FFTPlanX) Inverse(in, out *Array)

type Reductor

type Reductor struct {
	N int
	// contains filtered or unexported fields
}

A Reductor stores the necessary buffers to reduce data on the multi-GPU. It can be used to sum data, take minima, maxima, etc...

func NewReductor

func NewReductor(nComp int, size []int) *Reductor

Make reductor to reduce an array of given size

func (*Reductor) Dot

func (r *Reductor) Dot(in1, in2 *Array) float64

Takes the dot product of all elements of the arrays.

func (*Reductor) Free

func (r *Reductor) Free()

Frees the GPU buffer storage.

func (*Reductor) Init

func (r *Reductor) Init(nComp int, size []int)

Initiate buffers to reduce an array of given size

func (*Reductor) Max

func (r *Reductor) Max(in *Array) float64

Takes the maximum of all elements of the array.

func (*Reductor) MaxAbs

func (r *Reductor) MaxAbs(in *Array) float64

Takes the maximum of absolute values of all elements of the array.

func (*Reductor) MaxDiff

func (r *Reductor) MaxDiff(a, b *Array) float64

Takes the maximum absolute difference between the elements of a and b.

func (*Reductor) MaxNorm

func (r *Reductor) MaxNorm(a *Array) float64

Takes the maximum norm of a 3-component (vector) array.

func (*Reductor) MaxNormDiff

func (r *Reductor) MaxNormDiff(a, b *Array) float64

Takes the maximum norm of the difference between two 3-component (vector) arrays.

func (*Reductor) MaxSum

func (r *Reductor) MaxSum(a, b *Array) float64

Takes the maximum absolute sum between the elements of a and b.

func (*Reductor) Min

func (r *Reductor) Min(in *Array) float64

Takes the minimum of all elements of the array.

func (*Reductor) Sum

func (r *Reductor) Sum(in *Array) float64

Takes the sum of all elements of the array.

type Stream

type Stream cu.Stream
var STREAM0 Stream

Stream 0 on each GPU

func NewStream

func NewStream() Stream

Creates a new multi-GPU stream. Its use is similar as cu.Stream, but operates on all GPUs at the same time.

func (Stream) Destroy

func (s Stream) Destroy()

Destroys the multi-GPU stream.

func (Stream) Ready

func (s Stream) Ready() (ready bool)

Returns true if all underlying GPU streams have completed.

func (Stream) Sync

func (s Stream) Sync()

Synchronizes with all underlying GPU-streams

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL