Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CurationLevel ¶
type CurationLevel int
CurationLevel tells if matched result was returned by at least one DataSource in the following categories.
const ( // NotCurated means that all DataSources where the name-string was matched // are not curated sufficiently. NotCurated CurationLevel = iota // AutoCurated means that at least one of the returned DataSources invested // significantly in curating their data by scripts. AutoCurated // Curated means that at least one DataSource is marked as sufficiently // curated. It does not mean that the particular match was manually checked // though. Curated )
func (CurationLevel) MarshalJSON ¶
func (cl CurationLevel) MarshalJSON() ([]byte, error)
MarshalJSON implements json.Marshaller interface and converts MatchType into a string.
func (CurationLevel) String ¶
func (cl CurationLevel) String() string
func (*CurationLevel) UnmarshalJSON ¶
func (cl *CurationLevel) UnmarshalJSON(bs []byte) error
UnmarshalJSON implements json.Unmarshaller interface and converts a string into MatchType.
type DataSource ¶
type DataSource struct { // ID is a DataSource Id. ID int `json:"id"` // UUID generated by GlobalNames and associated with the DataSource UUID string `json:"uuid,omitempty"` // Title is a full title of a DataSource Title string `json:"title"` // TitleShort is a shortened/abbreviated title of a DataSource. TitleShort string `json:"titleShort"` // Version of the data-set for a DataSource. Version string `json:"version,omitempty"` // RevisionDate of a data-set from a data-provider. // It follows format of 'YYYY-MM-DD' || 'YYYY-MM' || 'YYYY' // This data comes from the information given by the data-provider, // while UpdatedAt field is the date of harvesting of the // resource. RevisionDate string `json:"releaseDate,omitempty"` // DOI of a DataSource; DOI string `json:"doi,omitempty"` // Citation representing a DataSource Citation string `json:"citation,omitempty"` // Authors associated with the DataSource Authors string `json:"authors,omitempty"` // Description of the DataSource. Description string `json:"description,omitempty"` // WebsiteURL is a hompage of a DataSource WebsiteURL string `json:"homeURL,omitempty"` // OutlinkURL is a template for generating outlink URLs. Verification // output will substitute '{}' with an OutlinkID OutlinkURL string `json:"-"` // IsOutlinkReady is true for data-sources that have enough data and // metadata to be recommended for outlinking by third-party applications // (be included into data-sources). When false, it does not // mean that the original resource is not valuable, it means that // its representation at gnames is not complete/resent enough. IsOutlinkReady bool `json:"isOutlinkReady,omitempty"` // Curation determines how much of manual or programmatic work is put // into assuring the quality of the data. Curation CurationLevel `json:"curation"` // HasTaxonData is true if a DataSource has data about currently // accepted names and synonyms. HasTaxonData bool `json:"hasTaxonData"` // RecordCount tells how many entries are in a DataSource. RecordCount int `json:"recordCount"` // UpdatedAt is the last import date (YYYY-MM-DD). In contrast, // RevisionDate field indicates when the resource was // updated according to its data-provider. UpdatedAt string `json:"updatedAt"` }
DataSource provides metadata for an externally collected data-set.
type DataSourceDetails ¶ added in v0.21.0
type DataSourceDetails struct { // DataSourceID is the ID of the DataSource in GNverifier. DataSourceID int `json:"dataSourceId"` // TitleShort is the short name of the resource. TitleShort string `json:"title"` // Match is the collection of found records for the data-source. Match MatchShort `json:"match"` }
DataSourceDetails describe data-source and found best match.
type Input ¶ added in v0.4.3
type Input struct { // NameStrings is a list of name-strings to verify. NameStrings []string `json:"nameStrings"` // DataSources field contains DataSources IDs to limit results to only these // sources. The best result is calculated only out of this limited set of // data. By default only the BestResult is shown. To see all results use // WithAllMatches flag. DataSources []int `json:"dataSources"` // WithAllMatches provides all results, instead of only the BestResult. // The results are sorted by score, not by data-source. The top result is // the the best result. WithAllMatches bool `json:"withAllMatches"` // WithVernaculars indicates if corresponding vernacular results will be // returned as well. WithVernaculars bool `json:"withVernaculars"` // WithCapitalization flag; when true, the first rune of low-case // input name-strings will be capitalized if appropriate. WithCapitalization bool `json:"withCapitalization"` // WithSpeciesGroup flag; when true, species names also get matched by // their species group. It means that the request will take in account // botanical autonyms and zoological coordinated names. WithSpeciesGroup bool `json:"withSpeciesGroup"` // WithRelaxedFuzzyMatch flag; when true, the fuzzy matching rules are // relaxed. Normally it is switched off to decrease the number of false // positives and make verification faster. WithRelaxedFuzzyMatch bool `json:"withRelaxedFuzzyMatch"` // WithUninomialFuzzyMatch flag; when true, uninomial names are not // restricted from fuzzy matching. Normally it creates too many false // positives and is switched off. WithUninomialFuzzyMatch bool `json:"withUninomialFuzzyMatch"` // WithStats flag; when true, results will return the most prevalent // kingdom for the text, as well as the taxon which contains a given // percentage of all names in the text (MainTaxon). // // For example MainTaxon with the MainTaxonThreshold of 0.5 would correspond // to a taxon that contains at least half of all names. We use the // managerial classification of Catalogue of Life for the MainTaxon // calculation. WithStats bool `json:"withStats"` // MainTaxonThreshold sets the minimal percentage of names in a taxon // to be counted as a MainTaxon of a text. This field is ignored if // WithStats is false. // // MainTaxon is a taxon that contains at least MainTaxonThreshold percentage // of all names (genus and below) in the text. We use the managerial // classification of Catalogue of Life for the MainTaxon calculation. MainTaxonThreshold float32 `json:"mainTaxonThreshold"` }
Input is options/parameters for the Verify method.
type Kingdom ¶ added in v0.4.2
type Kingdom struct { // KingdomName is the name of a kingdom. KingdomName string `json:"kingdomName"` // NamesNumber is the number of names found in a kingdom. NamesNumber int `json:"namesNumber"` // Percentage is a percentage of names found in a kingdom. Percentage float32 `json:"percentage"` }
Kingdom provides statistics of matched names found in a particular kingdom.
type MatchShort ¶ added in v0.21.0
type MatchShort struct { // RecordID is the RecordID of a name-string. RecordID string `json:"recordId,omitempty"` // Outlink is the link to a match in the original data-source. Outlink string `json:"outlink,omitempty"` // NameString is verbatim representation of a name. NameString string `json:"nameString"` // AuthScore provides information about AuthorScore from sorting algorithm. AuthScore bool `json:"-"` }
MatchShort contains data of a matched record for a resource.
type MatchTypeValue ¶
type MatchTypeValue int
MatchTypeValue describes how a name-string matched a name in gnames database.
const ( // NoMatch means that matching failed. NoMatch MatchTypeValue = iota // PartialFuzzy is the same as PartialExact, but also the match was not // exact. We never do fuzzy matches for uninomials, due to high rate of false // positives. PartialFuzzy // PartialFuzzyRelaxed is the same as PartialFuzzy, but the fuzzy match // rules were relaxed. This brings more false positives, but also more // matches. PartialFuzzyRelaxed // PartialExact used if GNames failed to match full name string. Now the match // happened by removing either middle species epithets, or by choppping the // 'tail' words of the input name-string canonical form. PartialExact // Fuzzy means that matches were not exact due to similarity of name-strings, // OCR or typing errors. Take these results with more suspition than // Exact matches. Fuzzy match is never done on uninomials due to the // high rate of false positives. Fuzzy // FuzzyRelaxed is the same as Fuzzy, but the fuzzy match rules were relaxed. // This brings more false positives, but also more matches. FuzzyRelaxed // FuzzySpeciesGroup means that match happened not with the name, but // with either an autonym (botany)/coordinated name (zoology) of species, // or binomial part of a trinomial. FuzzySpeciesGroup // FuzzySpeciesGroupRelaxed is the same as FuzzySpeciesGroup, but the fuzzy // match rules were relaxed. This brings more false positives, but also more // matches. FuzzySpeciesGroupRelaxed // Exact means either canonical form, or the whole name-string matched // perfectlly. Exact // ExactSpeciesGroup means that match happened not with the name, but // with either an autonym (botany)/coordinated name (zoology) of species, // or binomial part of a trinomial. ExactSpeciesGroup // Virus names are matched in the database. `Virus` is a wide // term and includes a variety of non-cellular terms (virus, prion, plasmid, // vector etc.) Virus // FacetedSearch is a match made by search procedure. It does not happen // during verification. FacetedSearch )
func NewMatchType ¶
func NewMatchType(t string) MatchTypeValue
NewMatchType takes a string and converts it into a MatchType. If the string is unkown, it returns NoMatch type.
func (MatchTypeValue) MarshalJSON ¶
func (mt MatchTypeValue) MarshalJSON() ([]byte, error)
MarshalJSON implements json.Marshaller interface and converts MatchType into a string.
func (MatchTypeValue) String ¶
func (mt MatchTypeValue) String() string
String implements fmt.String interface and returns a string representation of a MatchType. The returned string can be converted back to MatchType via NewMatchType function.
func (*MatchTypeValue) UnmarshalJSON ¶
func (mt *MatchTypeValue) UnmarshalJSON(bs []byte) error
UnmarshalJSON implements json.Unmarshaller interface and converts a string into MatchType.
type Meta ¶ added in v0.4.1
type Meta struct { // NamesNumber is the number of name-strings in the request. NamesNumber int `json:"namesNumber"` // WithAllSources indicates if `Results` will include all matched // sources. WithAllSources bool `json:"withAllSources,omitempty"` // WithAllMatches indicates if response provides more then one result // per source, if such results were found. WithAllMatches bool `json:"withAllMatches,omitempty"` // WithStats indicates that the kingdom and a taxon that contain // majority of names (MainTaxon) will be calculated. WithStats bool `json:"withStats,omitempty"` // WithCapitalization is true if there was a request to capitalize input WithCapitalization bool `json:"withCapitalization,omitempty"` // WithSpeciesGroup is true if Input included `WithSpeciesGroup` option. WithSpeciesGroup bool `json:"withSpeciesGroup,omitempty"` // WithRelaxedFuzzyMatch is true if Input included `WithRelaxedFuzzyMatch` // option. It means that the fuzzy matching rules are relaxed. // Normally it is switched off to decrease the number of false positives. WithRelaxedFuzzyMatch bool `json:"withRelaxedFuzzyMatch,omitempty"` // WithUninomialFuzzyMatch is true when it when uninomial names go // through fuzzy matching. Normally it is switched off to decrease the // number of false positives. WithUninomialFuzzyMatch bool `json:"withUninomialFuzzyMatch,omitempty"` // DataSources provides IDs of data-sources from the request. DataSources []int `json:"dataSources,omitempty"` // MainTaxonThreshold provides a minimal percentage names that a taxon should // have to be qualified as a MainTaxon. MainTaxonThreshold float32 `json:"mainTaxonThreshold,omitempty"` // StatsNamesNum is the number of names qualified for MainTaxon/Kingdoms // calculation. StatsNamesNum int `json:"statsNamesNum,omitempty"` // MainTaxon provides the lowest taxon that contains most of the names from // the request. // // Non-matched names, names that are not in the Catalogue of Life, names // higher than genus are not part of the calculation. MainTaxon string `json:"mainTaxon,omitempty"` // MainTaxonPercentage indicates the percentage of names that are placed // in the MainTaxon. This number should be higher than // MainTaxonThreshold unless MainTaxon is empty. MainTaxonPercentage float32 `json:"mainTaxonPercentage,omitempty"` // Kingdom provides what kingdom includes the majority of names from the // request accorging to the managerial classification of Catalogue of Life. // // Non-matched names, or names that are not in Catalogue of Life are // not part of the calculation. Kingdom string `json:"kingdom,omitempty"` // KingdomPercentage provides the percentage of names in the most // prevalent kingdom. // // Non-matched names, or names that are not in Catalogue of Life are // not part of the calculation. KingdomPercentage float32 `json:"kingdomPercentage,omitempty"` // Kingdoms provides all kingdoms with matched names and names distribution // between the kingdoms. Kingdoms []Kingdom `json:"kingdoms,omitempty"` }
Meta is metadata of the request. It provides information about parameters used for the request, and, optionally give information about the kingdom that contains most of the names from the request, as well as the lowest taxon that contains majority of the names.
type Name ¶ added in v0.4.1
type Name struct { // ID is a UUIDv5 generated out of the Input string. ID string `json:"id"` // Name is a verified name-string Name string `json:"name"` // Cardinality is the cardinality of input name: // 0 - No match, virus or hybrid formula, // 1 - Uninomial, 2 - Binomial, 3 - Trinomial etc. Cardinality int `json:"cardinality"` // MatchType is best available match. MatchType MatchTypeValue `json:"matchType"` // BestResult is the best result according to GNames scoring. BestResult *ResultData `json:"bestResult,omitempty"` // Results contain all detected matches from preverred data sources // provided by user. Results []*ResultData `json:"results,omitempty"` // DataSourcesNum is a number of data sources that matched an // input name-string. DataSourcesNum int `json:"dataSourcesNum,omitempty"` DataSourcesIDs []int `json:"dataSourcesIds,omitempty"` // DataSourcesDetails contains information about matched data-sources // and the IDs of records that did match. DataSourcesDetails []DataSourceDetails `json:"dataSourcesDetails,omitempty"` // Curation estimates reliability of matched data sources. If // matches are returned by at least one manually curated data source, or by // automatically curated data source, or only by sources that are not // significantly manually curated. Curation CurationLevel `json:"curation"` // OverloadDetected might be triggered if a virus name or a canonical name // contain many variations and/or strains. In this case not all data are // queried. OverloadDetected string `json:"overloadDetected,omitempty"` // Error provides an error message, if any. If error is not empty, the match // failed because of a bug in the service. Error string `json:"error,omitempty"` }
Name is a result of verification of one name-string from the input.
type NameStringInput ¶ added in v0.15.0
type NameStringInput struct { // ID is the UUID v5 generated from a name-string ID string // DataSources is a slice of DataSourceIDs which should be used in the // output. If the slice is empty, all DataSources are used. DataSources []int // WithAllMatches controls if only the BestMatch, or all possible matches // are returned. WithAllMatches bool }
NameStringInput is used to get information about a particular name-string.
type NameStringMeta ¶ added in v0.15.0
type NameStringMeta struct { // ID is the UUID v5 generated for a particular name-string. ID string `json:"id"` // DataSources is a slice of DataSource IDs. If it is not empty, // the output results will be constrained to these IDs. DataSources []int `json:"dataSources,omitempty"` // WithAllMatches indicates if all matches should be returned, or // only the best matches. WithAllMatches bool `json:"withAllMatches"` }
NameStringMeta contains metadata from the provided input.
type NameStringOutput ¶ added in v0.15.0
type NameStringOutput struct { // NameStringMeta contains metadata from the input. NameStringMeta `json:"meta"` // Name is the found name data. *Name `json:"name"` }
NameStringOutput contains data corresponding to the provided name-string ID.
type Output ¶ added in v0.5.2
type Output struct { // Meta is the metadata of the request results. Meta `json:"metadata"` // Names are results of name-verification. Names []Name `json:"names"` }
Output is a result returned by Verify method.
type ResultData ¶
type ResultData struct { // DataSourceID is the ID of a matched DataSource. DataSourceID int `json:"dataSourceId"` // Shortened/abbreviated title of the data source. DataSourceTitleShort string `json:"dataSourceTitleShort"` // Curation of the data source. Curation CurationLevel `json:"curation"` // RecordID from a data source. We try our best to return ID that // corresponds to dwc:taxonID of a DataSource. If such ID is not provided, // this ID will be auto-generated. Auto-generated IDs will // have 'gn_' prefix. RecordID string `json:"recordId"` // GlobalID that is exposed globally by a DataSource. Such IDs are usually // self-resolved, like for example LSID, pURL, DOI etc. GlobalID string `json:"globalId,omitempty"` // LocalID used by a DataSource internally. If an OutLink field is provided, // LocalID serves as a 'dynamic' component of the URL. LocalID string `json:"localId,omitempty"` // Outlink to the record in the DataSource. It consists of a 'stable' // URL and an appended 'dynamic' LocalID Outlink string `json:"outlink,omitempty"` // EntryDate is a timestamp created on entry of the data. EntryDate string `json:"entryDate"` // SortScore is a numeric representation of the whole score. // It can be used to find the BestMatch overall, as well as the // best match for every data-source. // // SortScore takes data from all other scores, using the priority // sequence from highest to lowest: InfraSpecificRankScore, FuzzyLessScore, // CuratedDataScore, AuthorMatchScore, AcceptedNameScore, // ParsingQualityScore. Every highest priority trumps everything below. // When the final score value is calculated, it is used to // sort verification or search results. // // Comparing this score between results of different verifications will // not necessary be accurate. The score is used for comparison of names // from the same result. SortScore float64 `json:"sortScore"` // ParsingQuality determines how well gnparser was able to break the // name-string to its components. 0 - no parse, 1 - clean parse, // 2 - some problems, 3 - significant problems. ParsingQuality int `json:"-"` // MatchedID is the UUID v5 derived from the MatchedName. MatchedNameID string `json:"matchedNameID"` // MatchedName is a name-string from the DataSource that was matched // by GNames algorithm. MatchedName string `json:"matchedName"` // MatchCardinality is the cardinality of returned name: // 0 - No match, virus or hybrid formula, // 1 - Uninomial, 2 - Binomial, 3 - trinomial etc. MatchedCardinality int `json:"matchedCardinality"` // MatchedCanonicalSimple is a simplified canonical form without ranks for // names lower than species, and with omitted hybrid signs for named // hybrids. Quite often simple canonical is the same as full canonical. // Hybrid signs are preserved for hybrid formulas. MatchedCanonicalSimple string `json:"matchedCanonicalSimple,omitempty"` // MatchedCanonicalFull is a canonical form that preserves hybrid signs // and infraspecific ranks. MatchedCanonicalFull string `json:"matchedCanonicalFull,omitempty"` // MatchedAuthors is a list of authors mentioned in the name. MatchedAuthors []string `json:"-"` // MatchedYear is a year mentioned in the name. Multiple years or // approximate years are ignored. MatchedYear int `json:"-"` // CurrentRecordID is the id of currently accepted name given by // the data-source. CurrentRecordID string `json:"currentRecordId"` // CurrentID is the UUID v5 derived from the CurrentName. CurrentNameID string `json:"currentNameId"` // CurrentName is a currently accepted name (it is only provided by // DataSources with taxonomic data). CurrentName string `json:"currentName"` // CurrentCardinality is a cardinality of the accepted name. // It might differ from the matched name cardinality. CurrentCardinality int `json:"currentCardinality"` // CurrentCanonicalSimple is a canonical form for the currently accepted name. CurrentCanonicalSimple string `json:"currentCanonicalSimple"` // CurrentCanonicalFull is a full version of canonicall form for the // currently accepted name. CurrentCanonicalFull string `json:"currentCanonicalFull"` // TaxonomicStatus provides taxonomic status of a name. // Can be "Accepted", "Synonym", "N/A". TaxonomicStatus TaxonomicStatus `json:"taxonomicStatus"` // DEPRECATED: use TaxonomicStatus instead. // IsSynonym is a boolean value that is true if the matched name is a // synonym according to the data source. IsSynonym bool `json:"isSynonym"` // ClassificationPath to the name (if provided by the DataSource). // Classification path consists of a hierarchy of name-strings. ClassificationPath string `json:"classificationPath,omitempty"` // ClassificationRanks of the classification path. They follow the // same order as the classification path. ClassificationRanks string `json:"classificationRanks,omitempty"` // ClassificationIDs of the names-strings. They always correspond to // the "id" field. ClassificationIDs string `json:"classificationIds,omitempty"` // EditDistance is a Levenshtein edit distance between canonical form of the // input name-string and the matched canonical form. If match type is // "EXACT", edit-distance will be 0. EditDistance int `json:"editDistance"` // StemEditDistance is a Levenshtein edit distance after removing suffixes // from specific epithets from canonical forms. StemEditDistance int `json:"stemEditDistance"` //MatchType describes what kind of a match happened to a name-string. MatchType MatchTypeValue `json:"matchType"` // ScoreDetails provides data about matching of authors, year, rank, // parsingQuality... ScoreDetails `json:"scoreDetails"` // Vernacular names that correspond to the matched name. (Will be implemented // later) Vernaculars []Vernacular `json:"vernaculars,omitempty"` }
ResultData are returned data of the `BestResult` or `Results` of name verification.
type ScoreDetails ¶ added in v0.6.5
type ScoreDetails struct { // CardinalityScore is 1 when cardinality of input name and match name // match and neither cardinality is 0. In all other cases it this score // equal 0. CardinalityScore float32 `json:"cardinalityScore"` // InfraSpecificRankScore matches infraspecific rank. For example if a // query name is `Aus bus var. cus`, and the match has the same rank, // this field is 1. InfraSpecificRankScore float32 `json:"infraSpecificRankScore"` // FuzzyLessScore scores edit distance for fuzzy matching. If edit distance // is 0 the score is maxed to 1. FuzzyLessScore float32 `json:"fuzzyLessScore"` // CuratedDataScore scores highest if the matched data-source is known for // having a significant manual curation effort of the data. CuratedDataScore float32 `json:"curatedDataScore"` // AuthorMatchScore tries to match authors and years in the name. If // a year and all authors match, the score is 1. AuthorMatchScore float32 `json:"authorMatchScore"` // AcceptedNameScore is a binary field, if matched name is also currently // accepted name according to the data-source, the value is 1. AcceptedNameScore float32 `json:"acceptedNameScore"` // ParsingQualityScore is the highest for matched names that were parsed // without any problems. ParsingQualityScore float32 `json:"parsingQualityScore"` }
ScoreDetails provides explanations how sorting of result occures and why something became selected as the `BestResult`. Score data for every item is normalized to a range from 0 to 1 where 0 means there were no match by the factor, and 1 means a "perfect" match by the item. Fields located higher on the list have more weight than lower fields. It means that lower fields are getting into account only if higher fields provide equal values. For all scores 1 is the best, 0 is the worst.
type TaxonomicStatus ¶ added in v0.35.0
type TaxonomicStatus int
const ( UnknownTaxStatus TaxonomicStatus = iota AcceptedTaxStatus SynonymTaxStatus )
func New ¶ added in v0.35.0
func New(txStatus string) TaxonomicStatus
func (TaxonomicStatus) MarshalJSON ¶ added in v0.38.0
func (ts TaxonomicStatus) MarshalJSON() ([]byte, error)
MarshalJSON implements json.Marshaller interface and converts TaxonomicStatus into a string.
func (TaxonomicStatus) String ¶ added in v0.35.0
func (ts TaxonomicStatus) String() string
func (*TaxonomicStatus) UnmarshalJSON ¶ added in v0.38.0
func (ts *TaxonomicStatus) UnmarshalJSON(bs []byte) error
UnmarshalJSON implements json.Unmarshaller interface and converts a string into TaxonommicStatus.
type Vernacular ¶
type Vernacular struct { Name string `json:"name"` // Language of the name, hopefully in ISO form. Language string `json:"language,omitempty"` // Locality is geographic places where the name is used. Locality string `json:"locality,omitempty"` }
Vernacular name