Documentation
¶
Index ¶
- func LinearAttention(g *ag.Graph, qkv QKV, mappingFunction MappingFunc, eps mat.Float) []ag.Node
- func MakeCausalMask(curIndex, seqLength int) []mat.Float
- func ScaledDotProductAttention(g *ag.Graph, qkv QKV, scaleFactor mat.Float, useCausalMask bool) (context []ag.Node, prob []mat.Matrix)
- func ScaledDotProductAttentionConcurrent(g *ag.Graph, qkv QKV, scaleFactor mat.Float) (context []ag.Node, prob []mat.Matrix)
- type KeysValuesPair
- type MappingFunc
- type Output
- type QKV
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func LinearAttention ¶
LinearAttention performs the self-attention as a linear dot-product of kernel feature maps. It operates with O(N) complexity, where N is the sequence length. Reference: "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" by Katharopoulos et al. (2020)
func MakeCausalMask ¶ added in v0.5.0
MakeCausalMask returns a slice of size seqLength filled with zeros until curIndex, and the rest with -inf.
func ScaledDotProductAttention ¶
func ScaledDotProductAttention(g *ag.Graph, qkv QKV, scaleFactor mat.Float, useCausalMask bool) (context []ag.Node, prob []mat.Matrix)
ScaledDotProductAttention is a self-attention mechanism relating different positions of a single sequence to compute a representation of the same sequence. This method requires that the query, the key and the value vectors have already been obtained from the input sequence. The scaled factor is the square root of the dimension of the key vectors.
Types ¶
type KeysValuesPair ¶ added in v0.5.0
KeysValuesPair contains Keys and Values.
type MappingFunc ¶
MappingFunc is a mapping function used by LinearAttention.
type Output ¶ added in v0.5.0
type Output struct { // AttOutput is the result of the self-attention. AttOutput []ag.Node // AttWeights are the attention scores for each element of the sequence. AttWeights []mat.Matrix // ProjKeysValues is the list of Keys and Values used to compute the self-attention. ProjKeysValues KeysValuesPair }
Output aggregates the multiple output of the self-attentions, incl. attention scores and last projected keys and values.
type QKV ¶
QKV groups queries, keys and values useful for self-attention functions, as described in "Attention Is All You Need" (Vaswani et al., 2017 - http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf).
Directories
¶
Path | Synopsis |
---|---|
Package lshattention provides an implementation of the LSH-Attention model, as describe in `Reformer: The Efficient Transformer` by N. Kitaev, Ł. Kaiser, A. Levskaya (https://arxiv.org/pdf/2001.04451.pdf).
|
Package lshattention provides an implementation of the LSH-Attention model, as describe in `Reformer: The Efficient Transformer` by N. Kitaev, Ł. Kaiser, A. Levskaya (https://arxiv.org/pdf/2001.04451.pdf). |
Package syntheticattention provides an implementation of the Synthetic Attention described in: "SYNTHESIZER: Rethinking Self-Attention in Transformer Models" by Tay et al., 2020.
|
Package syntheticattention provides an implementation of the Synthetic Attention described in: "SYNTHESIZER: Rethinking Self-Attention in Transformer Models" by Tay et al., 2020. |