Documentation ¶
Overview ¶
Package bow provides a representation of a bag-of-words (BOW) along with definitions of common operations. These operations include computing the cosine or euclidean distance between two BOWs, comparing BOWs and producing BOWs from values of other types (like a PDB chain or a biological sequence).
This package also includes special interoperable functions with the original FragBag implementation written by Rachel Kolodny. Namely, BOWs in the original implementation are encoded as strings (Bow.StringOldStyle writes them and NewOldStyleBow reads them).
Index ¶
- type Bow
- func (b Bow) Add(b2 Bow) Bow
- func (b Bow) Cosine(b2 Bow) float64
- func (b Bow) Dot(b2 Bow) float64
- func (b Bow) Equal(b2 Bow) bool
- func (b Bow) Euclid(b2 Bow) float64
- func (b Bow) Len() int
- func (b Bow) Magnitude() float64
- func (b Bow) String() string
- func (b Bow) StringOldStyle() string
- func (b Bow) Weighted(lib fragbag.WeightedLibrary) Bow
- type BowDiff
- type Bowed
- type SequenceBower
- type StructureBower
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Bow ¶
type Bow struct { // Freqs is a map from fragment number to the number of occurrences of // that fragment in this "bag of words." This map always has size // equivalent to the size of the library. Freqs []float32 }
Bow represents a bag-of-words vector of size N for a particular fragment library, where N corresponds to the number of fragments in the fragment library.
Note that a Bow may be weighted. It is up to the fragment library to apply weights to a Bow.
func NewOldStyleBow ¶
NewOldStyleBow returns a bag-of-words from Fragbag's original bag-of-words vector output.
The format works by assinging the first 26 fragment numbers the letters 'a' ... 'z', the next 26 fragment numbers the letters 'A' ... 'Z', and any additional fragment numbers to 52, 53, 54, ..., etc. Moreover, the numbers are delimited by a '#' character, while the letters aren't delimited by anything.
Please see the documentation for (Bow).StringOldStyle for a production rule.
If the string is malformed, NewOldStyleBow will return an error.
func SequenceBow ¶
func SequenceBow(lib fragbag.SequenceLibrary, s seq.Sequence) Bow
SequenceBow is a helper function to compute a bag-of-words given a sequence fragment library and a query sequence.
If the lib given is a weighted library, then the BOW returned will also be weighted.
Note that this function should only be used when providing your own implementation of the SequenceBower interface. Otherwise, BOWs should be computed using the SequenceBow method of the interface.
func StructureBow ¶
func StructureBow(lib fragbag.StructureLibrary, atoms []structure.Coords) Bow
StructureBow is a helper function to compute a bag-of-words given a structure fragment library and a list of alpha-carbon atoms.
If the lib given is a weighted library, then the Bow returned will also be weighted.
Note that this function should only be used when providing your own implementation of the StructureBower interface. Otherwise, BOWs should be computed using the StructureBow method of the interface.
func (Bow) Add ¶
Add performs an add operation on each fragment frequency and returns a new Bow. Add will panic if the operands have different lengths.
func (Bow) Equal ¶
Equal tests whether two Bows are equal.
Two Bows are equivalent when the frequencies of every fragment are equal.
func (Bow) Len ¶
Len returns the size of the vector. This is always equivalent to the corresponding library's fragment size.
func (Bow) String ¶
String returns a string representation of the Bow vector. Only fragments with non-zero frequency are emitted.
The output looks like '{fragNum: frequency, fragNum: frequency, ...}'. i.e., '{1: 4, 3: 1}' where all fragment numbers except '1' and '3' have a frequency of zero.
func (Bow) StringOldStyle ¶
StringOldStyle returns a bag-of-words vector formatted as a string that matches the old Fragbag program's output.
The format works by assigning the first 26 fragment numbers the letters 'a' ... 'z', the next 26 fragment numbers the letters 'A' ... 'Z', and any additional fragment numbers to 52, 53, 54, ..., etc. Moreover, the numbers are delimited by a '#' character, while the letters aren't delimited by anything.
Here is a grammar describing the output:
output = { fragment } fragment = lower-letter | upper-letter | { integer }, "#" lower-letter = "a" | ... | "z" upper-letter = "A" | ... | "Z" integer = "0" | ... | "9"
The essential invariants are that any fragment number less than 52 is described by elements in the set { 'a', ..., 'z', 'A', ..., 'Z' } and any fragment number greater than or equal to 52 is described by a corresponding integer (>= 52) followed by a '#' character.
Note that the string returned by this function will not hold up under string equality with Fragbag's output. Namely, Fragbag outputs fragment numbers in an arbitrary order (probably the order in which they are found corresponding to the input PDB file). This order is not captured or preserved by BOW values in this package. Thus, the only way to truly test for equality is to convert Fragbag's output to a BOW using NewOldStyleBow, and using the (Bow).Equal method.
type BowDiff ¶
type BowDiff struct {
Freqs []float32
}
BowDiff represents the difference between two bag-of-words vectors. The types are quite similar, except diffFreqs represents difference between the frequency of a particular fragment number.
The BOW difference is simply the pairwise differences of fragment frequencies.
func NewBowDiff ¶
NewBowDiff creates a new BowDiff by subtracting the 'old' frequencies from the 'new' frequencies.
NewBowDiff will panic if 'oldbow' and 'newbow' have different lengths.
func (BowDiff) IsSame ¶
IsSame returns true if there are no differences. (i.e., all diff frequencies are zero.)
func (BowDiff) String ¶
String returns a string representation of the BOW diff vector. Only fragments with non-zero differences are emitted.
The output looks like '{fragNum: diff-frequency, fragNum: diff-frequency, ...}'. i.e., '{1: 4, 3: 1}' where all fragment numbers except '1' and '3' have a difference frequency of zero.
type Bowed ¶
type Bowed struct { // A globally unique identifier corresponding to the source of the bow. // e.g., a PDB identifier "1ctf" or a PDB identifier with a chain // identifier "1ctfA" or a sequence accession number. Id string // Arbitrary data associated with the source. May be empty. Data []byte // The bag-of-words. Bow Bow }
Bowed corresponds to a bag-of-words with meta data about its source. For example, a PDB chain can have a BOW computed for it. Meta data might include that chain's identifier (e.g., 1ctfA) and perhaps that chain's sequence.
Values of this type correspond to records in a BOW database.
type SequenceBower ¶
type SequenceBower interface { // Computes a bag-of-words given a sequence fragment library. SequenceBow(lib fragbag.SequenceLibrary) Bowed }
SequenceBower corresponds to Bower values that can provide BOWs given a sequence fragment library.
func BowerFromSequence ¶
func BowerFromSequence(s seq.Sequence) SequenceBower
BowerFromSequence provides a reference implementation of the SequenceBower interface for biological sequences.
type StructureBower ¶
type StructureBower interface { // Computes a bag-of-words given a structure fragment library. // For example, to compute the bag-of-words of a chain in a PDB entry: // // lib := someStructureFragmentLibrary() // chain := somePdbChain() // fmt.Println(BowerFromChain(chain).StructureBow(lib)) // // This is made easier by using pre-defined types in this package that // implement this interface. StructureBow(lib fragbag.StructureLibrary) Bowed }
StructureBower corresponds to Bower values that can provide BOWs given a structure fragment library.
func BowerFromChain ¶
func BowerFromChain(c *pdb.Chain) StructureBower
BowerFromChain provides a reference implementation of the StructureBower interface for PDB chains.
func BowerFromCifChain ¶
func BowerFromCifChain(c *pdbx.Chain) StructureBower
BowerFromCifChain provides a reference implementation of the StructureBower interface for chains in PDBx/mmCIF formatted files.
func BowerFromModel ¶
func BowerFromModel(c *pdb.Model) StructureBower
BowerFromModel provides a reference implementation of the StructureBower interface for PDB models.