multiheadattention

package

v1.1.0 Latest Latest Go to latest Published: Oct 30, 2023 License: BSD-2-Clause Imports: 11 Imported by: 4

Details

This section is empty.

This section is empty.

This section is empty.

type Cache []selfattention.Cache

Cache contains the self-attention cache for each head.

func (r Cache) At(i int) selfattention.Cache

type Model struct {
	nn.Module
	Heads       []*selfattention.Model
	OutputMerge *linear.Model
}

Model contains the serializable parameters.

func New[T float.DType](size, numOfHeads int, useCausalMask, isCrossAttention bool) *Model

New returns a new model with parameters initialized to zeros.

func (m *Model) Forward(cache Cache, q, x []mat.Tensor) ([]mat.Tensor, [][]mat.Tensor, Cache)

Forward performs the forward step for each input node and returns the result.

func (m *Model) Init(rng *rand.LockedRand)

Init initializes the self-attention heads and the merge layer with uniform Xavier random distribution.