verifier

package
v0.9.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 14, 2022 License: MIT Imports: 3 Imported by: 27

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CurationLevel

type CurationLevel int

CurationLevel tells if matched result was returned by at least one DataSource in the following categories.

const (
	// NotCurated means that all DataSources where the name-string was matched
	// are not curated sufficiently.
	NotCurated CurationLevel = iota

	// AutoCurated means that at least one of the returned DataSources invested
	// significantly in curating their data by scripts.
	AutoCurated

	// Curated means that at least one DataSource is marked as sufficiently
	// curated. It does not mean that the particular match was manually checked
	// though.
	Curated
)

func (CurationLevel) MarshalJSON

func (cl CurationLevel) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaller interface and converts MatchType into a string.

func (CurationLevel) String

func (cl CurationLevel) String() string

func (*CurationLevel) UnmarshalJSON

func (cl *CurationLevel) UnmarshalJSON(bs []byte) error

UnmarshalJSON implements json.Unmarshaller interface and converts a string into MatchType.

type DataSource

type DataSource struct {
	// ID is a DataSource Id.
	ID int `json:"id"`

	// UUID generated by GlobalNames and associated with the DataSource
	UUID string `json:"uuid,omitempty"`

	// Title is a full title of a DataSource
	Title string `json:"title"`

	// TitleShort is a shortened/abbreviated title of a DataSource.
	TitleShort string `json:"titleShort"`

	// Version of the data-set for a DataSource.
	Version string `json:"version,omitempty"`

	// RevisionDate of a data-set from a data-provider.
	// It follows format of 'YYYY-MM-DD' || 'YYYY-MM' || 'YYYY'
	// This data comes from the information given by the data-provider,
	// while UpdatedAt field is the date of harvesting of the
	// resource.
	RevisionDate string `json:"releaseDate,omitempty"`

	// DOI of a DataSource;
	DOI string `json:"doi,omitempty"`

	// Citation representing a DataSource
	Citation string `json:"citation,omitempty"`

	// Authors associated with the DataSource
	Authors string `json:"authors,omitempty"`

	// Description of the DataSource.
	Description string `json:"description,omitempty"`

	// WebsiteURL is a hompage of a DataSource
	WebsiteURL string `json:"homeURL,omitempty"`

	// OutlinkURL is a template for generating outlink URLs. Verification
	// output will substitute '{}' with an OutlinkID
	OutlinkURL string `json:"-"`

	// IsOutlinkReady is true for data-sources that have enough data and
	// metadata to be recommended for outlinking by third-party applications
	// (be included into data-sources). When false, it does not
	// mean that the original resource is not valuable, it means that
	// its representation at gnames is not complete/resent enough.
	IsOutlinkReady bool `json:"isOutlinkReady,omitempty"`

	// Curation determines how much of manual or programmatic work is put
	// into assuring the quality of the data.
	Curation CurationLevel `json:"curation"`

	// RecordCount tells how many entries are in a DataSource.
	RecordCount int `json:"recordCount"`

	// UpdatedAt is the last import date (YYYY-MM-DD). In contrast,
	// RevisionDate field indicates when the resource was
	// updated according to its data-provider.
	UpdatedAt string `json:"updatedAt"`
}

DataSource provides metadata for an externally collected data-set.

type Input added in v0.4.3

type Input struct {
	// NameStrings is a list of name-strings to verify.
	NameStrings []string `json:"nameStrings"`

	// DataSources field contains DataSources IDs whos matches will be
	// returned becides the best result. See Results field in
	// Verirication.  If `dataSources` are []int{0}, then all matched Sources
	// are used.
	DataSources []int `json:"dataSources"`

	// WithAllMatches indicates that all matches per data-source are returned,
	// sorted by score (instead of the best match per source).
	WithAllMatches bool `json:"withAllMatches"`

	// WithVernaculars indicates if corresponding vernacular results will be
	// returned as well.
	WithVernaculars bool `json:"withVernaculars"`

	// WithCapitalization flag; when true, the first rune of low-case
	// input name-strings will be capitalized if appropriate.
	WithCapitalization bool `json:"withCapitalization"`

	// WithContext flag; when true, results will return the most prevalent
	// kingdom for the text, as well as the clade which contains a given
	// percentage of all names in the text.
	//
	// For examplle context with threshold 0.5 would correspond to a clade that
	// contains at least half of all names. We use the managerial classification
	// of Catalogue of Life for the context calculation.
	WithContext bool `json:"withContext"`

	// ContextThreshold sets the minimal percentage of names in a clade
	// to be counted as a context of a text.
	//
	// Context is a clade that contains at least ContextThreshold percentage
	// of all names in the text. We use the managerial classification of
	// Catalogue of Life for the context calculation.
	ContextThreshold float32 `json:"contextThreshold"`
}

Input is options/parameters for the Verify method.

type Kingdom added in v0.4.2

type Kingdom struct {
	// KingdomName is the name of a kingdom.
	KingdomName string `json:"kingdomName"`

	// NamesNumber is the number of names found in a kingdom.
	NamesNumber int `json:"namesNumber"`

	// Percentage is a percentage of names found in a kingdom.
	Percentage float32 `json:"percentage"`
}

Kingdom provides statistics of matched names found in a particular kingdom.

type MatchTypeValue

type MatchTypeValue int

MatchTypeValue describes how a name-string matched a name in gnames database.

const (
	// NoMatch means that matching failed.
	NoMatch MatchTypeValue = iota

	// PartialFuzzy is the same as PartialExact, but also the match was not
	// exact. We never do fuzzy matches for uninomials, due to high rate of false
	// positives.
	PartialFuzzy

	// PartialExact used if GNames failed to match full name string. Now the match
	// happened by removing either middle species epithets, or by choppping the
	// 'tail' words of the input name-string canonical form.
	PartialExact

	// Fuzzy means that matches were not exact due to similarity of name-strings,
	// OCR or typing errors. Take these results with more suspition than
	// Exact matches. Fuzzy match is never done on uninomials due to the
	// high rate of false positives.
	Fuzzy

	// Exact means either canonical form, or the whole name-string matched
	// perfectlly.
	Exact

	// Virus names are matched in the database. `Virus` is a wide
	// term and includes a variety of non-cellular terms (virus, prion, plasmid,
	// vector etc.)
	Virus

	// FacetedSearch is a match made by search procedure. It does not happen
	// during verification.
	FacetedSearch
)

func NewMatchType

func NewMatchType(t string) MatchTypeValue

NewMatchType takes a string and converts it into a MatchType. If the string is unkown, it returns NoMatch type.

func (MatchTypeValue) MarshalJSON

func (mt MatchTypeValue) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaller interface and converts MatchType into a string.

func (MatchTypeValue) String

func (mt MatchTypeValue) String() string

String implements fmt.String interface and returns a string representation of a MatchType. The returned string can be converted back to MatchType via NewMatchType function.

func (*MatchTypeValue) UnmarshalJSON

func (mt *MatchTypeValue) UnmarshalJSON(bs []byte) error

UnmarshalJSON implements json.Unmarshaller interface and converts a string into MatchType.

type Meta added in v0.4.1

type Meta struct {
	// NamesNumber is the number of name-strings in the request.
	NamesNumber int `json:"namesNumber"`

	// WithAllSources indicates if `Results` will include all matched
	// sources.
	WithAllSources bool `json:"withAllSources,omitempty"`

	// WithAllMatches indicates if response provides more then one result
	// per source, if such results were found.
	WithAllMatches bool `json:"withAllMatches,omitempty"`

	// WithContext indicates that the kingdom and convergence clade that contain
	// majority of names will be calculated.
	WithContext bool `json:"withContext,omitempty"`

	// WithCapitalization is true, if the was a request to capitalize input
	WithCapitalization bool `json:"withCapitalization,omitempty"`

	// DataSources provides IDs of data-sources from the request.
	DataSources []int `json:"dataSources,omitempty"`

	// ContextThreshold provides a minimal percentage names that a clade should
	// have to be qualified as a Context clade.
	ContextThreshold float32 `json:"contextThreshold,omitempty"`

	// Number of names qualified for context/kingdoms calculation
	ContextNamesNum int `json:"contextNamesNum,omitempty"`

	// Context provides the lowest clade that contains most of names from
	// the request.
	//
	// Non-matched names, or names that are not in Catalogue of Life are
	// not part of the calculation.
	Context string `json:"context,omitempty"`

	// ContextPercentage indicates the percentage of names that are placed
	// in the "context" clade. This number should be higher than
	// ContexThreshold unless Context is empty.
	ContextPercentage float32 `json:"contextPercentage,omitempty"`

	// Kingdom provides what kingdom includes the majority of names from the
	// request accorging to the managerial classification of Catalogue of Life.
	//
	// Non-matched names, or names that are not in Catalogue of Life are
	// not part of the calculation.
	Kingdom string `json:"kingdom,omitempty"`

	// KingdomPercentage provides the percentage of names in the most
	// prevalent kingdom.
	//
	// Non-matched names, or names that are not in Catalogue of Life are
	// not part of the calculation.
	KingdomPercentage float32 `json:"kingdomPercentage,omitempty"`

	// Kingdoms provides all kingdoms with matched names and names distribution
	// between the kingdoms.
	Kingdoms []Kingdom `json:"kingdoms,omitempty"`
}

Meta is metadata of the request. It provides intofmation about parameters used for the request, and, optionally give information about the kingdom that contains most of the names from the request, as well as the lowest clade that contains majority of the names.

type Name added in v0.4.1

type Name struct {
	// ID is a UUIDv5 generated out of the Input string.
	ID string `json:"id"`

	// Name is a verified name-string
	Name string `json:"name"`

	// Cardinality is the cardinality of input name:
	// 0 - No match, virus or hybrid formula,
	// 1 - Uninomial, 2 - Binomial, 3 - Trinomial etc.
	Cardinality int `json:"cardinality"`

	// MatchType is best available match.
	MatchType MatchTypeValue `json:"matchType"`

	// BestResult is the best result according to GNames scoring.
	BestResult *ResultData `json:"bestResult,omitempty"`

	// Results contain all detected matches from preverred data sources
	// provided by user.
	Results []*ResultData `json:"results,omitempty"`

	// DataSourcesNum is a number of data sources that matched an
	// input name-string.
	DataSourcesNum int `json:"dataSourcesNum,omitempty"`

	// Curation estimates reliability of matched data sources. If
	// matches are returned by at least one manually curated data source, or by
	// automatically curated data source, or only by sources that are not
	// significantly manually curated.
	Curation CurationLevel `json:"curation"`

	// OverloadDetected might be triggered if a virus name or a canonical name
	// contain many variations and/or strains. In this case not all data are
	// queried.
	OverloadDetected string `json:"overloadDetected,omitempty"`

	// Error provides an error message, if any. If error is not empty, the match
	// failed because of a bug in the service.
	Error string `json:"error,omitempty"`
}

Name is a result of verification of one name-string from the input.

func (Name) Clades added in v0.4.1

func (n Name) Clades() []context.Clade

type Output added in v0.5.2

type Output struct {
	// Meta is metadata of the request.
	Meta `json:"metadata"`
	// Names from the request.
	Names []Name `json:"names"`
}

Output is a result returned by Verify method.

type ResultData

type ResultData struct {
	// DataSourceID is the ID of a matched DataSource.
	DataSourceID int `json:"dataSourceId"`

	// Shortened/abbreviated title of the data source.
	DataSourceTitleShort string `json:"dataSourceTitleShort"`

	// Curation of the data source.
	Curation CurationLevel `json:"curation"`

	// RecordID from a data source. We try our best to return ID that corresponds to
	// dwc:taxonID of a DataSource. If such ID is not provided, this ID will be
	// auto-generated.  Auto-generated IDs will have 'gn_' prefix.
	RecordID string `json:"recordId"`

	// GlobalID that is exposed globally by a DataSource. Such IDs are usually
	// self-resolved, like for example LSID, pURL, DOI etc.
	GlobalID string `json:"globalId,omitempty"`

	// LocalID used by a DataSource internally. If an OutLink field is provided,
	// LocalID serves as a 'dynamic' component of the URL.
	LocalID string `json:"localId,omitempty"`

	// Outlink to the record in the DataSource. It consists of a 'stable'
	// URL and an appended 'dynamic' LocalID
	Outlink string `json:"outlink,omitempty"`

	// EntryDate is a timestamp created on entry of the data.
	EntryDate string `json:"entryDate"`

	// Score determines how well the match did work. It is used to determine
	// best match overall, and best match for every data-source.
	Score uint32 `json:"-"`

	// ParsingQuality determines how well gnparser was able to break the
	// name-string to its components. 0 - no parse, 1 - clean parse,
	// 2 - some problems, 3 - significant problems.
	ParsingQuality int `json:"-"`

	// MatchedName is a name-string from the DataSource that was matched
	// by GNames algorithm.
	MatchedName string `json:"matchedName"`

	// MatchCardinality is the cardinality of returned name:
	// 0 - No match, virus or hybrid formula,
	// 1 - Uninomial, 2 - Binomial, 3 - trinomial etc.
	MatchedCardinality int `json:"matchedCardinality"`

	// MatchedCanonicalSimple is a simplified canonicl form without ranks for
	// names lower than species, and with ommited hybrid signs for named hybrids.
	// Quite often simple canonical is the same as full canonical. Hybrid signs
	// are preserved for hybrid formulas.
	MatchedCanonicalSimple string `json:"matchedCanonicalSimple,omitempty"`

	// MatchedCanonicalFull is a canonical form that preserves hybrid signs
	// and infraspecific ranks.
	MatchedCanonicalFull string `json:"matchedCanonicalFull,omitempty"`

	// MatchedAuthors is a list of authors mentioned in the name.
	MatchedAuthors []string `json:"-"`

	// MatchedYear is a year mentioned in the name. Multiple years or
	// approximate years are ignored.
	MatchedYear int `json:"-"`

	// CurrentRecordID is the id of currently accepted name given by
	// the data-source.
	CurrentRecordID string `json:"currentRecordId"`

	// CurrentName is a currently accepted name (it is only provided by
	// DataSources with taxonomic data).
	CurrentName string `json:"currentName"`

	// CurrentCardinality is a cardinality of the accepted name.
	// It might differ from the matched name cardinality.
	CurrentCardinality int `json:"currentCardinality"`

	// CurrentCanonicalSimple is a canonical form for the currently accepted name.
	CurrentCanonicalSimple string `json:"currentCanonicalSimple"`

	// CurrentCanonicalFull is a full version of canonicall form for the
	// currently accepted name.
	CurrentCanonicalFull string `json:"currentCanonicalFull"`

	// IsSynonym is true if there is an indication in the DataSource that the
	// name is not a currently accepted name for one or another reason.
	IsSynonym bool `json:"isSynonym"`

	// ClassificationPath to the name (if provided by the DataSource).
	// Classification path consists of a hierarchy of name-strings.
	ClassificationPath string `json:"classificationPath,omitempty"`

	// ClassificationRanks of the classification path. They follow the
	// same order as the classification path.
	ClassificationRanks string `json:"classificationRanks,omitempty"`

	// ClassificationIDs of the names-strings. They always correspond to
	// the "id" field.
	ClassificationIDs string `json:"classificationIds,omitempty"`

	// EditDistance is a Levenshtein edit distance between canonical form of the
	// input name-string and the matched canonical form. If match type is
	// "EXACT", edit-distance will be 0.
	EditDistance int `json:"editDistance"`

	// StemEditDistance is a Levenshtein edit distance after removing suffixes
	// from specific epithets from canonical forms.
	StemEditDistance int `json:"stemEditDistance"`

	//MatchType describes what kind of a match happened to a name-string.
	MatchType MatchTypeValue `json:"matchType"`

	// ScoreDetails provides data about matching of authors, year, rank,
	// parsingQuality...
	ScoreDetails `json:"scoreDetails"`

	// Vernacular names that correspond to the matched name. (Will be implemented
	// later)
	Vernaculars []Vernacular `json:"vernaculars,omitempty"`
}

ResultData are returned data of the `BestResult` or `Results` of name verification.

type ScoreDetails added in v0.6.5

type ScoreDetails struct {
	// InfraSpecificRankScore matches infraspecific rank. For example if a
	// query name is `Aus bus var. cus`, and the match has the same rank,
	// this field is 1.
	InfraSpecificRankScore float32 `json:"infraSpecificRankScore"`

	// FuzzyLessScore scores edit distance for fuzzy matching. If edit distance
	// is 0 the score is maxed to 1.
	FuzzyLessScore float32 `json:"fuzzyLessScore"`

	// CuratedDataScore scores highest if the matched data-source is known for
	// having a significant manual curation effort of the data.
	CuratedDataScore float32 `json:"curatedDataScore"`

	// AuthorMatchScore tries to match authors and years in the name. If
	// a year and all authors match, the score is 1.
	AuthorMatchScore float32 `json:"authorMatchScore"`

	// AcceptedNameScore is a binary field, if matched name is also currently
	// accepted name according to the data-source, the value is 1.
	AcceptedNameScore float32 `json:"acceptedNameScore"`

	// ParsingQualityScore is the highest for matched names that were parsed
	// without any problems.
	ParsingQualityScore float32 `json:"parsingQualityScore"`
}

ScoreDetails provides explanations how sorting of result occures and why something became selected as the `BestResult`. Score data for every item is normalized to a range from 0 to 1 where 0 means there were no match by the factor, and 1 means a "perfect" match by the item. Fields located higher on the list have more weight than lower fields. It means that lower fields are getting into account only if higher fields provide equal values. For all scores 1 is the best, 0 is the worst.

type Vernacular

type Vernacular struct {
	Name string `json:"name"`

	// Language of the name, hopefully in ISO form.
	Language string `json:"language,omitempty"`

	// Locality is geographic places where the name is used.
	Locality string `json:"locality,omitempty"`
}

Vernacular name

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL