Documentation
¶
Overview ¶
package cui2vec implements utilities for dealing with cui2vec Embeddings and mapping cuis to text.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type AliasMapping ¶
func LoadCUIAliasMapping ¶
func LoadCUIAliasMapping(path string) (AliasMapping, error)
type Embeddings ¶
Embeddings is a complete cui2vec file loaded into memory.
type Mapping ¶
func LoadCUIFrequencyMapping ¶
func LoadCUIMapping ¶
LoadCUIMapping loads a mapping of cui to most common title.
Mapping of cuis->title is constructed as per: Jimmy, Zuccon G., Koopman B. (2018) Choices in Knowledge-Base Retrieval for Consumer Health Search. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science, vol 10772. Springer, Cham
File must reflect this.
type PrecomputedEmbeddings ¶
PrecomputedEmbeddings is a type of cui2vec container where the distances between CUIs have been pre-computed. It contains a sparse Matrix where the rows are CUIs and the columns are the distances to other CUIs. Each row is formatted in the form [CUI, score, CUI, score, ...]. Each CUI must be converted back to a string, and each score must be re-normalised from an int back to a float (taken care of by the Similar method).
func NewPrecomputedEmbeddings ¶
func NewPrecomputedEmbeddings(r io.Reader) (*PrecomputedEmbeddings, error)
func (*PrecomputedEmbeddings) LoadModel ¶
func (v *PrecomputedEmbeddings) LoadModel(r io.Reader) error
LoadModel reads a model from disk into memory. The file format of the pre-computed distances file is that of a single, continuous byte sequence starting with four bytes indicating the rows in the matrix. The first four bytes indicate a single Uint32 number representing the size of the matrix. This is used to create a fixed-size sparse matrix. The `Cols` attribute of the `PrecomputedEmbeddings` type is used to read N four-byte Uint32 numbers at a time to populate the columns of the matrix.
func (*PrecomputedEmbeddings) Similar ¶
func (v *PrecomputedEmbeddings) Similar(cui string) ([]Concept, error)
Similar matches a given input CUI to the `Cols`-closest CUIs in the cui2vec embedding space. As each row in the matrix is encoded into (CUI, score) pairs, this method handles that. It also converts each int value in the matrix into either a string CUI or a re-normalised softmax score float64.
func (*PrecomputedEmbeddings) WriteModel ¶
func (v *PrecomputedEmbeddings) WriteModel(w io.Writer) error
WriteModel writes a pre-computed distance matrix to disk. The write begins with a four-byte sequence to be parsed as a Uint32 representing the size of the matrix. Each value of the matrix is then written one by one in a continuous byte sequence, where each element in the matrix is encoded as a four-byte sequence to be parsed as a Uint32. Elements of the matrix are written row-by-row, and each row is exactly `Cols` wide. If there are less than `Cols` elements in a row, the row is padded with zeros.
type SimResponse ¶
type SimResponse struct {
V []Concept
}
type UncompressedEmbeddings ¶
func (*UncompressedEmbeddings) LoadModel ¶
func (v *UncompressedEmbeddings) LoadModel(r io.Reader) error
LoadModel a cui2vec pre-trained model into memory. The pre-trained file from:
https://arxiv.org/pdf/1804.01486.pdf
which was downloaded from:
https://figshare.com/s/00d69861786cd0156d81
is a csv file. The skipFirst parameter determines if the first line of the file should be skipped.
type VecClient ¶
type VecClient struct {
// contains filtered or unexported fields
}
func NewVecClient ¶
type VecResponse ¶
type VecResponse struct {
V []float64
}